[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195820#comment-13195820
 ] 

Jan Høydahl edited comment on SOLR-2764 at 1/29/12 10:09 PM:
-------------------------------------------------------------

Thanks Christian. I further refined stuff:

- I think the MinimalStemmer is more or less good to go, it seems to do what 
it's supposed to
- For LightStemmer, we now do "two-pass" removal for the -dom and -het endings. 
This means that the word "kristendom" will first be stemmed to "kristen", and 
then all the general rules apply so it will be further stemmed to "krist". The 
effect of this is that both "kristen,kristendom,kristendommen,kristendommens" 
will all be stemmed to "krist" (due to in this case incorrect interpretation of 
-en as singular definite ending).
- Added some more tests to highlight this

What do you think, is this -dom -het thing a reasonable improvement or could 
there be side effects?

Are there some other general rules that could easily be incorporated to catch 
semi-regular conjugations for the light stemmer?
                
      was (Author: janhoy):
    Thanks Christian. I further refined stuff:

- For MinimalStemmer, we now do two-pass removal for the -dom and -het endings. 
This means that the word kristendom will first be stemmed to kristen, and then 
all the general rules apply so it will be further stemmed to krist. The effect 
of this is that both "kristen,kristendom,kristendommen,kristendommens" will all 
be stemmed to "krist" (due to in this case incorrect interpretation of -en as 
plural ending), but when stopping at -dom removal, kristendom would not match 
inflections of kristen.

What do you think, is this a reasonable improvement or could there be side 
effects? I've not added these rules to the MinimalStemmer, to keep it simpler.
                  
> Create a NorwegianLightStemmer and NorwegianMinimalStemmer
> ----------------------------------------------------------
>
>                 Key: SOLR-2764
>                 URL: https://issues.apache.org/jira/browse/SOLR-2764
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, 
> SOLR-2764.patch
>
>
> We need a simple light-weight stemmer and a minimal stemmer for 
> plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to