Mark, does this mean Sen will be under the Apache license? (it is currently LGPL)
On Mon, Oct 12, 2009 at 1:46 PM, Mark Bennett <mbenn...@ideaeng.com> wrote: > Hi folks, > > I've been working to fix the Japanese SEN morphological analyzer, which is > currently hosted at: > https://sen.dev.java.net > > To review, Japanese doesn't use whitespace for word breaks. The > traditional approach to CJK (Chinese, Japanese, Korean) is to use bigram > character pairs in the index. While this works to a point, some believe > that using proper word breaks provides better results. > > The "lucene-ja" glue layer between Lucene and the core SEN library broke in > May of '09 when a fix was made in Lucene: > http://issues.apache.org/jira/browse/LUCENE-1636 > > Uwe S. had a very good insight for a quick fix, and I have been cleaning up > some other issues with the code. I have also spoken the author Takashi > Okamoto and he is fine to have this moved from java.net to ASF; I think it > will be easier for folks to find and use it if it's in ASF. > > I'm not quite ready to submit a patch, but the Wiki suggests emailing the > list with the idea in advance. There are some packaging questions I'll > have, there's actually quite a few parts. Also, the wiki didn't quite spell > out the process to get things into contrib, beyond emailing and submitting a > patch. I also plan to eventually submit a Solr-specific wrapper to the solr > dev list, to allow for dynamic config changes to be made from Solr's > schema. But since the original code was Lucene based, and it provides the > broadest reach, I think having it in core Lucene would be a good start. > > Any comments, suggestions, or mentor volunteers? :-) > > Mark > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > -- Robert Muir rcm...@gmail.com