Hi Simo,
I'm not sure I understood how BitSets would be used in this case. For example, 
an example with chars might look like this.
AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'], 
['a','e','f','g'],['a']) // 'a' is not encoded

and the mapping would become a -> a, b -> e, c -> f, d -> g
so encoding encode("abc") would become "aef".
Ints can be used instead of chars to support unicode code points that don't fit 
in a single char (which was our case, but if that seems overkill, the chars 
implementation is much more direct).
How did you mean the BitSet to be used?
Regards,Eyal

 

    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi 
<simonetrip...@apache.org> wrote:
 

 Hi,I personally think it would a very "nice to have" feature, I had to face 
similar issues in the past and, if that feature was available would have saved 
me developing time.
I just have a small request/suggestion: since int/char can be casted to each 
other, I would use BitSets rather than Sets.
Good luck!-Simo

http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <eyal_allw...@yahoo.com.invalid> 
wrote:

Hi guys,
Would you be interested in adding a utility class that creates alphabet 
converters, perhaps using a helper method available from StringUtils? It 
doesn't have to stay the way it is now, but the API for the class - 
AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept 
chars as well * * doNotEncode represents chars we want to leave in the original 
state (not to encode them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, 
Set<Integer> doNotEncode);
public String encode (String original);

public String decode (String encoded);
In StringUtils, we could add

public AlphabetConverter getAlphabetConverter (Set<Integer> original, 
Set<Integer> encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I 
wanted as delimiters, and preserving the English alphabet as is for 
readability. If you'd like to add it, I'll clean up the code and prepare it for 
a pull request so you can review it.

It makes sense to me to add a method that returns the HashMaps used internally 
for the mappings so they can be serialized (and deserialized) for preserving 
the mapping.
Regards,Eyal Allweil (PayPal)







   

Reply via email to