[
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880422#comment-16880422
]
Adrien Grand commented on LUCENE-4312:
--------------------------------------
Recording position lengths in the index is the easy part of the problem in my
opinion. I'm concerned that this will introduce significant complexity to
phrase queries (they will require backtracking in order to deal with the case
that a term exists twice at the same position with different position lengths),
and even make sloppy phrase queries and their spans/intervals counterparts
meaningless (as terms could be very distant according to the index only because
there is one term in-between that has a multi-term synonym indexed).
> Index format to store position length per position
> --------------------------------------------------
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 6.0
> Reporter: Gang Luo
> Priority: Minor
> Labels: Suggestion
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and
> Codec APIs) to store an additional int position length per position.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]