[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536992#comment-13536992 ]
Sivan Yogev commented on LUCENE-4258: ------------------------------------- After rethinking the point-of-inversion issue, seems like the right time to do it is ASAP - not to hold the added fields and invert them later, but rather invert them immediately and save their inverted version. 3 reasons for that: 1. Take out the constraint I inserted to the API, so update fields can be reused and contain Reader/TokenStrem, 2. NRT support: we cannot search until we invert, and if we invert earlier NRT support will be less complicated, probably some variation on multi-reader to view uncommitted updates, 3. You are correct that we currently do not account for the RAM usage of the FieldsUpdate, since I thought using RAMUsageEstimator will be too costly. It will probably be more efficient to calculate RAM usage of the inverted fields, maybe even during inversion? So my question in that regard is how can I invert a document and hold its inverted form to be used by NRT and later inserted into stacked segment? Should I create a temporary Directory and invert into it? Is there another way to do this? bq. Merging is very important. Hmm, are we able to just merge all updates down to a single update? Ie, without merging the base segment? We can't express that today from MergePolicy right? In an NRT setting this seems very important (ie it'd be best bang (= improved search performance) for the buck (= merge cost)). Shai is helping in creation of a benchmark to test performance in various scenarios. I will start adding updates aspects to the merge policy. I am not sure if merging just updates of a segment is feasible. In what cases would it be better than collapsing all updates into the base segment? bq. I think we need a test that indexes a known (randomly generated) set of documents, randomly sometimes using add and sometimes using update/replace field, mixing in deletes (just like TestField.addDocuments()), for the first index, and for the second index only using addDocument on the "surviving" documents, and then we assertIndexEquals(...) in the end? Maybe we can factor out code from TestDuelingCodecs or TestStressIndexing2. TestFieldReplacements already had a test which randomly adds documents, replaces documents, adds fields and replaces fields. I refactored it to enable using a seed, and created a "clean" version with only addDocument(...) calls. However, the FieldInfos of the "clean" version do not include things that the "full" version includes because in the full version fields possessing certain field traits where added and then deleted. I will look at the other suggestions. > Incremental Field Updates through Stacked Segments > -------------------------------------------------- > > Key: LUCENE-4258 > URL: https://issues.apache.org/jira/browse/LUCENE-4258 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Sivan Yogev > Attachments: IncrementalFieldUpdates.odp, > LUCENE-4258-API-changes.patch, LUCENE-4258.r1410593.patch, > LUCENE-4258.r1412262.patch, LUCENE-4258.r1416438.patch, > LUCENE-4258.r1416617.patch, LUCENE-4258.r1422495.patch, > LUCENE-4258.r1423010.patch > > Original Estimate: 2,520h > Remaining Estimate: 2,520h > > Shai and I would like to start working on the proposal to Incremental Field > Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org