I created LUCENE-2288 for handling the Object[] thingy in SnowballProgram (and Class[] in Among).
Shai On Sat, Feb 27, 2010 at 8:48 PM, Robert Muir <rcm...@gmail.com> wrote: > Can you open an issue for the new object[]? its sad about the hungarian > issue. I'm inclined to think we should add savoy's and default to it > instead. I don't see this as code duplication, as its a different alg. > Normally just don't spend a lot of effort towards adding alternative > stemmers, but here it makes sense. > > It sounds really exciting if you are able to merge in what you have done in > the future! > > On Feb 27, 2010 1:16 PM, "Shai Erera" <ser...@gmail.com> wrote: > > Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch > the generated code, besides handling calling deprecated API. > > We've actually taken the same approach I think :). In my Analyzer, the user > passes a Locale to create the proper Analyzer. The analyzer comes > pre-configured w/ all bunch of filters, like those that handle email tokens > produced by the tokenizer (or hosts, acronyms and more), character > normalization, ngram/stemmer filters etc. The StemmerFilter creates the > proper stemmer based on the language code, and for that I created a > SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball > ones. The wrapper is only needed for the stemmer filter instance ... > > I have on my TODO checking contrib/analyzers. Unfortunately our legal > department is very suspicious of everything (guess they wouldn't make good > legat folks otherwise ;)). If I'll want to use the contrib/analyzers, > they'll need to scan the code and identify the owners of the various > analyzers ... That's what's on my TODO - going through the process w/ them > :). > > I personally think that the work you're doing on the analyzers is > extraordinary, and since I don't have much time maintaining my own package, > it has fallen a bit behind in terms of Unicode differences and such. I've > come to appreciate the power of open source long ago - for me it'd be best > to join forces on this analysis package. I'm sure that will happen one day > :). > > About the Hungarian stemmer - Martin Porter told us that the original (12?) > stemmers were written by him and so there's no IP issues. The rest were > contributed by other people. All but the Hun contributor responded w/ their > rights to contribute the code. It's just the Hun that never responded, even > though we've sent a couple of emails. That is problematic. When someone > contributes code to Lucene, he grants the ASF license (forgot the wording > that's used). That's very reassuring to lawyers, because it doesn't leave > them too exposed. But there isn't any similar process in Snowball ... I can > look up the correspondence we've had with Martin Porter to refresh my memory > on the detailds. > Shai > > On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir <rcm...@gmail.com> wrote: > > > > i wanted to continue this... > >