[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Jack Krupansky (JIRA) Sat, 27 Apr 2013 08:30:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643703#comment-13643703
 ]


Jack Krupansky commented on LUCENE-4956:
----------------------------------------

bq. The stempel and morfologik analysis modules are both Polish analyzers - if 
the first one had been named "polish", what would we have done with the second 
one?

That's exactly what I was talking about.

We have four distinct concepts:

1. Module name. 
2. Package name.
3. Source tree path.
4. Module jar name.

They should incorporate both the language code and the "implementation name" 
(e.g., "stempel" or "morphologik").

The module should be something like "analysis/pl/stempel" or 
"analysis/stempel/pl". I prefer the former - it says that the first priority is 
to organize by language, and secondarily by implementation.

And the package name should be something like 
"org.apache.lucene.analysis.pl.stempel" or 
"org.apache.lucene.analysis.stempel.pl". I prefer the former, for the same 
rationale as for module name.

There seems to be a third form of name "analyzer-xxx". But as far as I can tell 
it is only an artifact of the doc or make some old Lucene thing.

And then there are the partial names for the individual jar files. There seems 
to be both "lucene-analyzers-stempel-x.y.z" and 
"lucene-analyzers-morphologik-x.y.z" in contrib/lucene-libs and then multiple 
"morpologik-a.b.c" jars in contrib.lib.

In short, to answer your question more directly, in my ideal world we would 
have srource tree and package names like:

lucene/analysis/pl/stempel/src
lucene/analysis/pl/morphologik/src
lucene/analysis/ko/arirang/src

org.apache.lucene.analysis.pl.stempel
org.apache.lucene.analysis.pl.morfologik
org.apache.lucene.analysis.ko.arirang

This would allow multiple implementations for a single language in the same 
application.

Although I could see reversing the language and implementation names if there 
is some need to share implementation code across languages.

                
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to