[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097890#comment-13097890
 ] 

Manish commented on LUCENE-3415:
--------------------------------

The index size becomes huge (infact double). 
We have 2 fields both indexed and stored, one with stemming and one without 
stemming. We thought of removing the stored=true from one of the fields, but 
highlighting becomes the problem(the field 1 wont have original words and hence 
term vectors wont highlight it properly)

I have an idea bases on Simon's comments, dont know if it going to work or not. 

1. Create new Filter Factory which will put both the stemmed word and original 
word. 
2. Field 1-> indexed=true, stored=true, use the above filter
3. Field 2-> indexed=true, stored=false, dont use the above filter. 

I can make searches against the corresponding fields. for highlighting, i can 
always use Field 1 and since term vectors, offsets and positions are present 
for original words too, it will highlight properly. 

Do let me know your thoughts on this. 

> Snowball filter to include original word too
> --------------------------------------------
>
>                 Key: LUCENE-3415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3415
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.3
>         Environment: All
>            Reporter: Manish
>              Labels: features
>             Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to