[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536992#comment-13536992
 ] 

Sivan Yogev commented on LUCENE-4258:
-------------------------------------

After rethinking the point-of-inversion issue, seems like the right time to do 
it is ASAP - not to hold the added fields and invert them later, but rather 
invert them immediately and save their inverted version. 3 reasons for that:
1. Take out the constraint I inserted to the API, so update fields can be 
reused and contain Reader/TokenStrem,
2. NRT support: we cannot search until we invert, and if we invert earlier NRT 
support will be less complicated, probably some variation on multi-reader to 
view uncommitted updates,
3. You are correct that we currently do not account for the RAM usage of the 
FieldsUpdate, since I thought using RAMUsageEstimator will be too costly. It 
will probably be more efficient to calculate RAM usage of the inverted fields, 
maybe even during inversion?

So my question in that regard is how can I invert a document and hold its 
inverted form to be used by NRT and later inserted into stacked segment? Should 
I create a temporary Directory and invert into it? Is there another way to do 
this?

bq. Merging is very important. Hmm, are we able to just merge all updates down 
to a single update? Ie, without merging the base segment? We can't express that 
today from MergePolicy right? In an NRT setting this seems very important (ie 
it'd be best bang (= improved search performance) for the buck (= merge cost)).

Shai is helping in creation of a benchmark to test performance in various 
scenarios. I will start adding updates aspects to the merge policy. I am not 
sure if merging just updates of a segment is feasible. In what cases would it 
be better than collapsing all updates into the base segment?

bq. I think we need a test that indexes a known (randomly generated) set of 
documents, randomly sometimes using add and sometimes using update/replace 
field, mixing in deletes (just like TestField.addDocuments()), for the first 
index, and for the second index only using addDocument on the "surviving" 
documents, and then we assertIndexEquals(...) in the end? Maybe we can factor 
out code from TestDuelingCodecs or TestStressIndexing2.

TestFieldReplacements already had a test which randomly adds documents, 
replaces documents, adds fields and replaces fields. I refactored it to enable 
using a seed, and created a "clean" version with only addDocument(...) calls. 
However, the FieldInfos of the "clean" version do not include things that the 
"full" version includes because in the full version fields possessing certain 
field traits where added and then deleted. I will look at the other suggestions.

                
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>         Attachments: IncrementalFieldUpdates.odp, 
> LUCENE-4258-API-changes.patch, LUCENE-4258.r1410593.patch, 
> LUCENE-4258.r1412262.patch, LUCENE-4258.r1416438.patch, 
> LUCENE-4258.r1416617.patch, LUCENE-4258.r1422495.patch, 
> LUCENE-4258.r1423010.patch
>
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to