[
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220885#comment-13220885
]
Shai Erera commented on LUCENE-3837:
------------------------------------
bq. it merges updates on the fly, at the cost of keeping a static map of
primary->secondary ids
ah ok, I missed that part.
> A modest proposal for updateable fields
> ---------------------------------------
>
> Key: LUCENE-3837
> URL: https://issues.apache.org/jira/browse/LUCENE-3837
> Project: Lucene - Java
> Issue Type: New Feature
> Components: core/index
> Affects Versions: 4.0
> Reporter: Andrzej Bialecki
>
> I'd like to propose a simple design for implementing updateable fields in
> Lucene. This design has some limitations, so I'm not claiming it will be
> appropriate for every use case, and it's obvious it has some performance
> consequences, but at least it's a start...
> This proposal uses a concept of "overlays" or "stacked updates", where the
> original data is not removed but instead it's overlaid with the new data. I
> propose to reuse as much of the existing APIs as possible, and represent
> updates as an IndexReader. Updates to documents in a specific segment would
> be collected in an "overlay" index specific to that segment, i.e. there would
> be as many overlay indexes as there are segments in the primary index.
> A field update would be represented as a new document in the overlay index .
> The document would consist of just the updated fields, plus a field that
> records the id in the primary segment of the document affected by the update.
> These updates would be processed as usual via secondary IndexWriter-s, as
> many as there are primary segments, so the same analysis chains would be
> used, the same field types, etc.
> On opening a segment with updates the SegmentReader (see also LUCENE-3836)
> would check for the presence of the "overlay" index, and if so it would open
> it first (as an AtomicReader? or it would open individual codec format
> readers? perhaps it should load the whole thing into memory?), and it would
> construct an in-memory map between the primary's docId-s and the overlay's
> docId-s. And finally it would wrap the original format readers with "overlay
> readers", initialized also with the id map.
> Now, when consumers of the 4D API would ask for specific data, the "overlay
> readers" would first re-map the primary's docId to the overlay's docId, and
> check whether overlay data exists for that docId and this type of data (e.g.
> postings, stored fields, vectors) and return this data instead of the
> original. Otherwise they would return the original data.
> One obvious performance issue with this appraoch is that the sequential
> access to primary data would translate into random access to the overlay
> data. This could be solved by sorting the overlay index so that at least the
> overlay ids increase monotonically as primary ids do.
> Updates to the primary index would be handled as usual, i.e. segment merges,
> since the segments with updates would pretend to have no overlays) would just
> work as usual, only the overlay index would have to be deleted once the
> primary segment is deleted after merge.
> Updates to the existing documents that already had some fields updated would
> be again handled as usual, only underneath they would open an IndexWriter on
> the overlay index for a specific segment.
> That's the broad idea. Feel free to pipe in - I started some coding at the
> codec level but got stuck using the approach in LUCENE-3836. The approach
> that uses a modified SegmentReader seems more promising.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]