[ 
https://issues.apache.org/jira/browse/LUCENE-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1817:
--------------------------------

    Attachment: LUCENE-1817.patch

Here is a javadocs-only patch that I think is the best solution.

This is because i created several custom dictionaries and found:
1) it will be difficult to support this dictionary format for a number of 
reasons
2) the dictionary format is limited to GB2312 encoding, and will not support 
things like traditional chinese
3) even when creating a correct file in the correct format, there are many 
assumptions about what should be in the dictionary.
   Especially things like WordDictionary.expandDelimiterData
   If these assumptions are not met, things like infinite loops occur.

I recommend we instead remove javadocs describing how to use a custom 
dictionary.
And in this patch also expand the EXPERIMENTAL wording from just APIs, to both 
APIs and file formats.
In the future we should refactor and use a unicode-based format.

I won't do anything here without some consensus that others feel it is the 
right way to go, but I think we should do this in 2.9


> it is impossible to use a custom dictionary for SmartChineseAnalyzer
> --------------------------------------------------------------------
>
>                 Key: LUCENE-1817
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1817
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: dataFiles.zip, LUCENE-1817-mark-cn-experimental.patch, 
> LUCENE-1817.patch, LUCENE-1817.patch
>
>
> it is not possible to use a custom dictionary, even though there is a lot of 
> code and javadocs to allow this.
> This is because the custom dictionary is only loaded if it cannot load the 
> built-in one (which is of course, in the jar file and should load)
> {code}
> public synchronized static WordDictionary getInstance() {
>     if (singleInstance == null) {
>       singleInstance = new WordDictionary(); // load from jar file
>       try {
>         singleInstance.load();
>       } catch (IOException e) { // loading from jar file must fail before it 
> checks the AnalyzerProfile (where this can be configured)
>         String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
>         singleInstance.load(wordDictRoot);
>       } catch (ClassNotFoundException e) {
>         throw new RuntimeException(e);
>       }
>     }
>     return singleInstance;
>   }
> {code}
> I think we should either correct this, document this, or disable custom 
> dictionary support...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to