[ 
https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054803#comment-16054803
 ] 

Mikhail Khludnev commented on LUCENE-7863:
------------------------------------------

Got it. Nice decision! 

So, instead of searching for name:\*ar, it flips query to name_rev:ra*, then 
for every doc (if we need phrase logic or highlighting): it seeks original 
term's postings to the same doc, and read positions and offsets.  

Thinking about EdgeNGramms (searching for name:\*a\*), derivative field should 
go like this: {{ar_bar}}, {{r_bar}} to be able to switch to original term's 
posting. So, I still think that even with this approach (second DOCS_ONLY 
field) blowing postings by these derivative terms still might not be 
affordable.   

And coming back to your question:
bq. have you thought of using another field that only has the reversed terms?
No. I haven't thought about it. It's a great idea! Thanks for contributing it.  


> Don't repeat postings and positions on ReverseWF, EdgeNGram, etc  
> ------------------------------------------------------------------
>
>                 Key: LUCENE-7863
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7863
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Mikhail Khludnev
>         Attachments: LUCENE-7863.hazard
>
>
> h2. Context
> \*suffix and \*infix\* searches on large indexes. 
> h2. Problem
> Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm 
> shuddering to think about EdgeNGrams...
> h2. Proposal 
> _DRY_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to