[jira] [Comment Edited] (LUCENE-4258) Incremental Field Updates through Stacked Segments

Sunny Khatri (JIRA) Wed, 12 Mar 2014 10:41:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932048#comment-13932048
 ]


Sunny Khatri edited comment on LUCENE-4258 at 3/12/14 5:39 PM:
---------------------------------------------------------------

Hi Guys,

I've been looking at this patch and wanted to know if there's any update on the 
release date for this patch.

I was able to try out this patch and observed some issues regarding the term 
offsets for the stacked up segment data. It seems like when a new update is 
made on top of the stack (Operation.ADD_FIELDS), their offsets begins back from 
0. For example (and a use case) : Let a document be { term1 term2 term3 term4 
term5}. Now we send the whole document in multiple chunks. 
Update 1: term1 term2 term3
Update 2: term4 term5

Now the stack looks like (along with their positions):
term4:::0 term5:::1
term1:::0 term2:::1 term3:::2

So what we end up getting is two terms appearing at position 0, two on 
position1 etc.
CONS: Phrase queries, etc, won't work in this case, for instance, as search for 
"term3 term4". 

Just wanted to have a take from you guys to see if that issue could be resolved 
easily ? 

PS: Not sure if it's trivial to resolve that as we'll need to know the max 
length of the actual document chunk in the previous stack, and not the max 
position of the last term added to the stack, as last term in the actual doc 
could be a stopword, hence won't appear in the index, based on the 
configuration.  




was (Author: sunnyk):
Hi Guys,

I've been looking at this patch and wanted to know if there's any update on the 
release date for this patch.

I was able to try out this patch and observed some issues regarding the term 
offsets for the stacked up segment data. It seems like when a new update is 
made on top of the stack (Operation.ADD_FIELDS), their offsets begins back from 
0. For example (and a use case) : Let a document be { term1 term2 term3 term4 
term5}. Now we send the whole document in multiple chunks. 
Update 1: term1 term2 term3
Update 2: term4 term5

Now the stack looks like (along with their positions):
term4:::0 term5:::1
term1:::0 term2:::1 term3:::2

So what we end up getting is two terms appearing at position 0, two on 
position1 etc.
CONS: Phrase queries, etc, won't work in this case, for instance, as search for 
"term3 term4". 

Just wanted to have a take from you guys to see if that issue could be resolved 
easily ? 





> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>             Fix For: 4.7
>
>         Attachments: IncrementalFieldUpdates.odp, 
> LUCENE-4258-API-changes.patch, LUCENE-4258.branch.1.patch, 
> LUCENE-4258.branch.2.patch, LUCENE-4258.branch.4.patch, 
> LUCENE-4258.branch.5.patch, LUCENE-4258.branch.6.patch, 
> LUCENE-4258.branch.6.patch, LUCENE-4258.branch3.patch, 
> LUCENE-4258.r1410593.patch, LUCENE-4258.r1412262.patch, 
> LUCENE-4258.r1416438.patch, LUCENE-4258.r1416617.patch, 
> LUCENE-4258.r1422495.patch, LUCENE-4258.r1423010.patch
>
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4258) Incremental Field Updates through Stacked Segments

Reply via email to