[jira] Updated: (SOLR-1984) add HyphenationCompoundWordTokenFilterFactory class

Robert Muir (JIRA) Fri, 09 Jul 2010 07:36:17 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated SOLR-1984:
------------------------------

    Attachment: SOLR-1984.patch

Thank you very much for contributing this, its true there is no factory for 
this feature.

I updated your code with a few tweaks:
* allow null dictionary. This allows the use of just the hyphenation grammar 
(LUCENE-1287)
* allow encoding to be specified (but default to UTF-8). Some of the grammar 
distributions from offo dont use UTF-8 encoding.
* set onlyLongestMatch default to 'false'. this is just to be consistent with 
the TokenFilter itself, which defaults to false.
* added the Apache-licensed danish grammar to test-files, along with a small 
dictionary and some test cases.

if no one objects, i'll commit in a bit.


> add HyphenationCompoundWordTokenFilterFactory class
> ---------------------------------------------------
>
>                 Key: SOLR-1984
>                 URL: https://issues.apache.org/jira/browse/SOLR-1984
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: P B
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: HyphenationCompoundWordTokenFilterFactory.java, 
> SOLR-1984.patch
>
>
> Please can you include my contribution into Solr night builds.
> I can not compile on Linux server, I have tested only on Windows. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (SOLR-1984) add HyphenationCompoundWordTokenFilterFactory class

Reply via email to