Can you open an issue for the new object[]? its sad about the hungarian issue. I'm inclined to think we should add savoy's and default to it instead. I don't see this as code duplication, as its a different alg. Normally just don't spend a lot of effort towards adding alternative stemmers, but here it makes sense.
It sounds really exciting if you are able to merge in what you have done in the future! On Feb 27, 2010 1:16 PM, "Shai Erera" <ser...@gmail.com> wrote: Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch the generated code, besides handling calling deprecated API. We've actually taken the same approach I think :). In my Analyzer, the user passes a Locale to create the proper Analyzer. The analyzer comes pre-configured w/ all bunch of filters, like those that handle email tokens produced by the tokenizer (or hosts, acronyms and more), character normalization, ngram/stemmer filters etc. The StemmerFilter creates the proper stemmer based on the language code, and for that I created a SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball ones. The wrapper is only needed for the stemmer filter instance ... I have on my TODO checking contrib/analyzers. Unfortunately our legal department is very suspicious of everything (guess they wouldn't make good legat folks otherwise ;)). If I'll want to use the contrib/analyzers, they'll need to scan the code and identify the owners of the various analyzers ... That's what's on my TODO - going through the process w/ them :). I personally think that the work you're doing on the analyzers is extraordinary, and since I don't have much time maintaining my own package, it has fallen a bit behind in terms of Unicode differences and such. I've come to appreciate the power of open source long ago - for me it'd be best to join forces on this analysis package. I'm sure that will happen one day :). About the Hungarian stemmer - Martin Porter told us that the original (12?) stemmers were written by him and so there's no IP issues. The rest were contributed by other people. All but the Hun contributor responded w/ their rights to contribute the code. It's just the Hun that never responded, even though we've sent a couple of emails. That is problematic. When someone contributes code to Lucene, he grants the ASF license (forgot the wording that's used). That's very reassuring to lawyers, because it doesn't leave them too exposed. But there isn't any similar process in Snowball ... I can look up the correspondence we've had with Martin Porter to refresh my memory on the detailds. Shai On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir <rcm...@gmail.com> wrote: > > i wanted to continue this...