[jira] [Commented] (LUCENE-6875) New Serbian Filter

Dawid Weiss (JIRA) Wed, 04 Nov 2015 00:58:37 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989152#comment-14989152
 ]


Dawid Weiss commented on LUCENE-6875:
-------------------------------------

Hmm... this is in fact I think a problem with the test because the factory is 
there, but there are two different filters that accompany it:
{code}
SerbianNormalizationFilter.java
SerbianNormalizationFilterFactory.java
SerbianNormalizationRegularFilter.java
{code}
and the test complains about the other one:
{code}
[09:53:30.679] ERROR   1.09s J3 | TestAllAnalyzersHaveFactories.test <<<
   > Throwable #1: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.analysis.util.TokenFilterFactory with name 
'SerbianNormalizationRegular' does not exist. You need to add the corresponding 
JAR file supporting this SPI to your classpath. The current classpath supports 
the following names: [apostrophe, arabicnormalization, arabicstem, 
bulgarianstem, brazilianstem, cjkbigram, cjkwidth, soraninormalization, 
soranistem, commongrams, commongramsquery, dictionarycompoundword, 
hyphenationcompoundword, decimaldigit, lowercase, stop, type, uppercase, 
czechstem, germanlightstem, germanminimalstem, germannormalization, germanstem, 
greeklowercase, greekstem, englishminimalstem, englishpossessive, kstem, 
porterstem, spanishlightstem, persiannormalization, finnishlightstem, 
frenchlightstem, frenchminimalstem, irishlowercase, galicianminimalstem, 
galicianstem, hindinormalization, hindistem, hungarianlightstem, hunspellstem, 
indonesianstem, indicnormalization, italianlightstem, latvianstem, 
asciifolding, capitalization, codepointcount, fingerprint, hyphenatedwords, 
keepword, keywordmarker, keywordrepeat, length, limittokencount, 
limittokenoffset, limittokenposition, removeduplicates, stemmeroverride, trim, 
truncate, worddelimiter, scandinavianfolding, scandinaviannormalization, 
edgengram, ngram, norwegianlightstem, norwegianminimalstem, patternreplace, 
patterncapturegroup, delimitedpayload, numericpayload, tokenoffsetpayload, 
typeaspayload, portugueselightstem, portugueseminimalstem, portuguesestem, 
reversestring, russianlightstem, shingle, snowballporter, serbiannormalization, 
classic, standard, swedishlightstem, synonym, turkishlowercase, elision]
{code}

Robert, should there be a separate factory for that filter?

> New Serbian Filter
> ------------------
>
>                 Key: LUCENE-6875
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6875
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Nikola Smolenski
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: Trunk, 5.4
>
>         Attachments: Lucene-Serbian-Regular-1.patch
>
>
> This is a new Serbian filter that works with regular Latin text (the current 
> filter works with "bald" Latin). I described in detail what does it do and 
> why is it necessary at the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6875) New Serbian Filter

Reply via email to