[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526609#comment-13526609 ]
Michael McCandless commented on LUCENE-4258: -------------------------------------------- {quote} bq. Why do we have FieldsUpdate.Operation.ADD_DOCUMENT? It seems weird to pass that to IW.updateFields? Shouldn't apps just use IW.addDocument? We have ADD_ and REPLACE_ for FIELDS, and also REPLACE_DOCUMENTS, so having ADD_DOCUMENT would allow applications to work only with updateFields. There certainly are actions that can be performed in more than one way in this API, do you find this too confusing? {quote} Well I just generally prefer that there is one [obvious] way to do something ... it can cause confusion otherwise, ie users will wonder what's the difference between addDocument and updateFields(Operation.ADD_DOCUMENT, ...) {quote} bq. Why do we need SegmentInfoReader.readFilesList? ... I considered the alternative you propose of having a segmentInfo for each stacked segment, and it seemed too complex to manage than what is done with .del files, so I chose the .del files approach. You are right about it's privacy, I removed it from SegmentInfoReader and the actual readers have it privately. {quote} OK. {quote} bq. It looks like merge policies don't yet know about / target stacked segments ... I was planning to have it in another issue. should I create it already? {quote} Another issue is a good idea! No need to create it yet ... but it seems like it will be important for real usage. Do we have any sense of how performance degrades as the stack gets bigger? It's more on-the-fly merging at search-time... I'm worried about that search-time merge cost ... I think it's usually better to pay a higher indexing cost in exchange for faster search time, which makes LUCENE-4272 a compelling alternate approach... {quote} bq. It seems like we don't invert the document updates until the updates are applied? ... I went for the simple solution trying to introduce as less new concepts as possible (and still the patch size is >7000 lines). Your proposal should certainly be considered and maybe tested. I need to make sure I do the RAM calculations right, the added documents must be reflected in the RAM consumption of the deletions queue. {quote} OK that makes sense; we should definitely do whatever's easiest/fastest to get to a dirt path. We should think through the tradeoffs. I think it may confuse apps that the Field is not "consumed" after IW.updateFields returns, but rather cached and processed later. This means you cannot reuse fields, you have to be careful with pre-tokenized fields (can't reuse the TokenStream), etc. It also means NRT reopen is unexpectedly costly, because only on flush will we invert & index the documents, and it's a single-threaded operation during reopen (vs per-thread if we invert up front). Still it makes sense to do this for starters ... it's simpler. {quote} bq. Why does StoredFieldsReader.visitDocument need a Set for ignored fields? When fetching stored fields from a segment with replacements, it is possible that all contents of a certain field for the base and first n stacked segments should be ignored. Therefore, the implementation starts the visiting from the most recent updates. If we encounter at some stage a field replacement, that field name is added to the Set of ignored fields, and later the content of that field in the stacked segments we encounter (which are older updates) is ignored. {quote} Ahhh right. Are stored fields now sparse? Meaning if I have a segment w/ many docs, and I update stored fields on one doc, in that tiny stacked segments will the stored fields files also be tiny? > Incremental Field Updates through Stacked Segments > -------------------------------------------------- > > Key: LUCENE-4258 > URL: https://issues.apache.org/jira/browse/LUCENE-4258 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Sivan Yogev > Attachments: IncrementalFieldUpdates.odp, > LUCENE-4258-API-changes.patch, LUCENE-4258.r1410593.patch, > LUCENE-4258.r1412262.patch, LUCENE-4258.r1416438.patch, > LUCENE-4258.r1416617.patch > > Original Estimate: 2,520h > Remaining Estimate: 2,520h > > Shai and I would like to start working on the proposal to Incremental Field > Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org