[
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932048#comment-13932048
]
Sunny Khatri edited comment on LUCENE-4258 at 3/12/14 5:39 PM:
---------------------------------------------------------------
Hi Guys,
I've been looking at this patch and wanted to know if there's any update on the
release date for this patch.
I was able to try out this patch and observed some issues regarding the term
offsets for the stacked up segment data. It seems like when a new update is
made on top of the stack (Operation.ADD_FIELDS), their offsets begins back from
0. For example (and a use case) : Let a document be { term1 term2 term3 term4
term5}. Now we send the whole document in multiple chunks.
Update 1: term1 term2 term3
Update 2: term4 term5
Now the stack looks like (along with their positions):
term4:::0 term5:::1
term1:::0 term2:::1 term3:::2
So what we end up getting is two terms appearing at position 0, two on
position1 etc.
CONS: Phrase queries, etc, won't work in this case, for instance, as search for
"term3 term4".
Just wanted to have a take from you guys to see if that issue could be resolved
easily ?
PS: Not sure if it's trivial to resolve that as we'll need to know the max
length of the actual document chunk in the previous stack, and not the max
position of the last term added to the stack, as last term in the actual doc
could be a stopword, hence won't appear in the index, based on the
configuration.
was (Author: sunnyk):
Hi Guys,
I've been looking at this patch and wanted to know if there's any update on the
release date for this patch.
I was able to try out this patch and observed some issues regarding the term
offsets for the stacked up segment data. It seems like when a new update is
made on top of the stack (Operation.ADD_FIELDS), their offsets begins back from
0. For example (and a use case) : Let a document be { term1 term2 term3 term4
term5}. Now we send the whole document in multiple chunks.
Update 1: term1 term2 term3
Update 2: term4 term5
Now the stack looks like (along with their positions):
term4:::0 term5:::1
term1:::0 term2:::1 term3:::2
So what we end up getting is two terms appearing at position 0, two on
position1 etc.
CONS: Phrase queries, etc, won't work in this case, for instance, as search for
"term3 term4".
Just wanted to have a take from you guys to see if that issue could be resolved
easily ?
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Sivan Yogev
> Fix For: 4.7
>
> Attachments: IncrementalFieldUpdates.odp,
> LUCENE-4258-API-changes.patch, LUCENE-4258.branch.1.patch,
> LUCENE-4258.branch.2.patch, LUCENE-4258.branch.4.patch,
> LUCENE-4258.branch.5.patch, LUCENE-4258.branch.6.patch,
> LUCENE-4258.branch.6.patch, LUCENE-4258.branch3.patch,
> LUCENE-4258.r1410593.patch, LUCENE-4258.r1412262.patch,
> LUCENE-4258.r1416438.patch, LUCENE-4258.r1416617.patch,
> LUCENE-4258.r1422495.patch, LUCENE-4258.r1423010.patch
>
> Original Estimate: 2,520h
> Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]