[jira] Commented: (LUCENE-2529) always apply position increment gap between values

David Smiley (JIRA) Mon, 04 Oct 2010 11:44:01 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917697#action_12917697
 ]


David Smiley commented on LUCENE-2529:
--------------------------------------

bq. My suggestion is that if you have values like this with position 
dependencies, they are really one single value, not independent values, and 
don't belong in a multivalued-field.

My use-case is not like your example.  The value at index x has no relation to 
values before or after it in the same field.  It _does_ have a relationship to 
a separate multi-valued field's value at the same value index.  With this patch 
and an analyzer that sets all position increments to 0, all tokens for the same 
value index across both fields will have the same token position.

bq. (me)    For my problem space, I'm willing to sacrifice the ability to do 
phrase queries.

bq. Right, but my concern is that other users are not.  I don't think we should 
discard the first token's position increment value completely, will the 
QueryParser do this too?

The fact that my entire solution (for which this patch is a subset) can't do 
phrases is not evident in this patch.  Perhaps I should have kept my overall 
use case a secret so as not to cloud the purpose of this patch.  This patch is 
about generating predictable/intuitive/sensible (in my opinion) position 
values, particularly at the very start of a field.  I hope you would share my 
opinion if you apply the patch and examine the results such as what the test 
case the patch modifies does.

In my opinion (and apparently Mike McCandless -- "I like the idea of 
disregarding the posIncrGap of the first token.") I disagree.  The point of a 
position increment gap is only meaningful _between_ values.  It's meaningless 
for the first token.  At your suggestion I looked in QueryParser.java to look 
for problems that may have to do with the position increment.  I don't see any 
problems that would be introduced by this patch.  For convenient reference 
here, the position increment is grabbed at line 608:
{code:java}
int positionIncrement = (posIncrAtt != null) ? 
posIncrAtt.getPositionIncrement() : 1;
if (positionIncrement != 0) {
  positionCount += positionIncrement;
} else {
  severalTokensAtSamePosition = true;
}
{code}

The ensuing logic seems pretty sane to me (albeit complicated).  The only thing 
in here that could arguably be improved is line 645:
{code:java}
if (positionCount == 1 || (!quoted && !autoGeneratePhraseQueries)) {
{code}
I think the "== 1" should be "<= 1" just in case there's some intentional 
oddities in the analyzer that 99.9% of people wouldn't ever do.  I'm in the 
exception case.  But I have a different query time analyzer so I don't care 
that much.


> always apply position increment gap between values
> --------------------------------------------------
>
>                 Key: LUCENE-2529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2529
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>         Environment: (I don't know which version to say this affects since 
> it's some quasi trunk release and the new versioning scheme confuses me.)
>            Reporter: David Smiley
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: 
> LUCENE-2529_always_apply_position_increment_gap_between_values.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch, LUCENE-2529_test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term 
> positions.  I discovered that the position increment gap on indexing is only 
> applied between values when there are existing terms indexed for the 
> document.  I suspect this logic wasn't deliberate, it's just how its always 
> been for no particular reason.  I think it should always apply the gap 
> between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>           fieldState.position += 
> docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  
> if (i > 0).
> I don't think this change will affect anyone at all but it will certainly 
> help me.  Presently, I can either change this line in Lucene, or I can put in 
> a hack so that the first value for the document is some dummy value which is 
> wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2529) always apply position increment gap between values

Reply via email to