Re: Fix for Japanese SEN morphological analyzer, and moving into Contrib

Robert Muir Mon, 12 Oct 2009 11:11:01 -0700

Mark, does this mean Sen will be under the Apache license? (it is currently
LGPL)


On Mon, Oct 12, 2009 at 1:46 PM, Mark Bennett <mbenn...@ideaeng.com> wrote:

> Hi folks,
>
> I've been working to fix the Japanese SEN morphological analyzer, which is
> currently hosted at:
> https://sen.dev.java.net
>
> To review, Japanese doesn't use whitespace for word breaks.  The
> traditional approach to CJK (Chinese, Japanese, Korean) is to use bigram
> character pairs in the index.  While this works to a point, some believe
> that using proper word breaks provides better results.
>
> The "lucene-ja" glue layer between Lucene and the core SEN library broke in
> May of '09 when a fix was made in Lucene:
> http://issues.apache.org/jira/browse/LUCENE-1636
>
> Uwe S. had a very good insight for a quick fix, and I have been cleaning up
> some other issues with the code.  I have also spoken the author Takashi
> Okamoto and he is fine to have this moved from java.net to ASF; I think it
> will be easier for folks to find and use it if it's in ASF.
>
> I'm not quite ready to submit a patch, but the Wiki suggests emailing the
> list with the idea in advance.  There are some packaging questions I'll
> have, there's actually quite a few parts.  Also, the wiki didn't quite spell
> out the process to get things into contrib, beyond emailing and submitting a
> patch.  I also plan to eventually submit a Solr-specific wrapper to the solr
> dev list, to allow for dynamic config changes to be made from Solr's
> schema.  But since the original code was Lucene based, and it provides the
> broadest reach, I think having it in core Lucene would be a good start.
>
> Any comments, suggestions, or mentor volunteers?  :-)
>
> Mark
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>



-- 
Robert Muir
rcm...@gmail.com

Re: Fix for Japanese SEN morphological analyzer, and moving into Contrib

Reply via email to