Re: [codec] getting the bmpm code out there

sebb Thu, 11 Aug 2011 13:10:51 -0700

On 11 August 2011 20:56, Gary Gregory <[email protected]> wrote:
> Hello All!
>
> Topic 1: Housekeeping: package name and POM.
>
> The next codec release out of trunk will be major release labeled 2.0,
> the current release is 1.5.
>
> In trunk, I've removed deprecated methods and the project now requires
> Java 5. This means 2.0 will not be a drop-in binary compatible release
> for 1.5.
>
> I'd like to confirm or deny that this means the package name will
> change to o.a.c.codec2 and that the POM groupId will have to change
> from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
> live side by side.


Yes, the name changes are necessary to avoid problems with incompatible jars.

> I'd like to get this out of the way first hence topic 1.
>
>
> Topic 2: Beider-Morse (BM) Encoder API
> https://issues.apache.org/jira/browse/CODEC-125
>
> BM is a new codec for 2.0.
>
> The encode API returns a set of encodings.
>
> In trunk, this is currently a String in the format "s1|s2|s3".
>
> I think this is not the best design, a set should be a Set, in this
> case, an ordered set. Or, a List. Generally, it should be a Collection
> of Strings.
>
> There was concern with call sites that generically use a [codec]
> Encoder with the signature "Object encoder(Object)" and call
> toString() on the result.
>
> If we set the API to "CharSequence encode(Set<CharSequence>)" or
> "String encode(Set<String>)", doing a toString() on a HashSet will
> yield a usable String similar as to what trunk does now. For example,
> for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
> "[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
> documented and stable format.

+1

> Topic 3: Generics
>
> This will be in a separate thread but I'd like to get this in 2.0
> because this will likely break the API and I only want to break things
> once and not have to do a codec3 for generics.

+1.

> Thank you all,
> Gary
>
> On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
> <[email protected]> wrote:
>> Hi,
>>
>> As those of you who've been following the CODEC-125 ticket will know, with
>> Greg's help I've got a port of the beider morse phonetic
>> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
>> ready for people to use and abuse. It ideally needs more test-case words,
>> but to the best of my knowledge it doesn't have any horrendous bugs or
>> performance issues.
>>
>> The discussion on the ticket started to stray off bmpm and on to policy for
>> releases and changing APIs, and Sebb said we should discuss it on the list.
>> So, here we are.
>>
>> Ideally, I'd like there to be a release of commons-codec some time soon so
>> that users can start to try out bmpm right away, and so that we can start
>> the process of adding it to the list of supported indexing methods in solr.
>> What do people think?
>>
>> Matthew
>>
>> --
>> Dr Matthew Pocock
>> Visitor, School of Computing Science, Newcastle University
>> mailto: [email protected]
>> gchat: [email protected]
>> msn: [email protected]
>> irc.freenode.net: drdozer
>> tel: (0191) 2566550
>> mob: +447535664143
>>
>
>
>
> --
> Thank you,
> Gary
>
> http://garygregory.wordpress.com/
> http://garygregory.com/
> http://people.apache.org/~ggregory/
> http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [codec] getting the bmpm code out there

Reply via email to