[
https://issues.apache.org/jira/browse/LUCENE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266101#comment-15266101
]
David Smiley commented on LUCENE-7267:
--------------------------------------
RE the default offset gap being 1 -- it's been this way since I don't know how
long. Note that the PostingsHighlighter assumes a single char offset gap.
What do you think Lucene _should_ be doing here? It's not clear to me what you
propose. What it's doing seems fine to me but maybe I'm not understanding your
point?
> Field with an explicit TokenStream must be tokenized and then uses the
> default Analyzer offset gaps
> ---------------------------------------------------------------------------------------------------
>
> Key: LUCENE-7267
> URL: https://issues.apache.org/jira/browse/LUCENE-7267
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Dawid Weiss
> Priority: Minor
>
> This took me somewhat by surprise. We have a pretty complex code that uses
> fields with explicit token streams (which provide their own offset data) and
> multivalues.
> It was surprising to see that offsets for subsequent values were shifted by 1
> compared to what was explicitly provided in the OffsetAttribute. A bit of
> debugging showed this code inside {{PerField.invert}}:
> {code}
> if (analyzed) {
> invertState.position +=
> docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> invertState.offset += docState.analyzer.getOffsetGap(fieldInfo.name);
> }
> {code}
> A field with an explicit token stream must still be declared as tokenized and
> PerField then thinks that this field must have come from an analyzer (where
> in fact it didn't):
> {code}
> final boolean analyzed = fieldType.tokenized() && docState.analyzer !=
> null;
> {code}
> While the default position increment is 0, the default offset gap isn't --
> it's 1, causing the shift.
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]