[ https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604683#comment-13604683 ]
David Smiley commented on LUCENE-4658: -------------------------------------- You raise a good point there Rob; BinaryDocValues is pretty close and might be sufficient as-is. But do we need segment based tracking hooks? Perhaps it's useful for parallel / overlay indexes that maintain docid consistency (LUCENE-4258 ?), but I don't think that needs to be centered around any particular special field. Shai's issue description points to a comment I made but it was in turn a quote of Rob. Rob & I didn't call out a need for segment level tracking; it was commit level tracking. A couple use-cases I had in mind when I made the comment are: * Storing per-document data that changes often like the number of clicks/accesses to the search result -- ultimately used to influence scoring. The application's backing store would probably be an in-memory cache with occasional syncs to disk. * Storing a large per-document body text in an external data source (e.g. a DB or file system). Lucene needlessly merges stored fields which I think is quite wasteful, not to mention putting it in Lucene is redundant if you already manage it somewhere else. It's ultimately needed via Lucene's API for highlighting. Is per-segment tracking needed for this? Or is this really about hooks to enable a parallel segment level index? I dunno. > Per-segment tracking of external/side-car data > ---------------------------------------------- > > Key: LUCENE-4658 > URL: https://issues.apache.org/jira/browse/LUCENE-4658 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-4658.patch, LUCENE-4658.patch > > > Spinoff from David's idea on LUCENE-4258 > (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352 > ) > I made a prototype patch that allows custom per-segment "side-car > data". It adds an abstract ExternalSegmentData class. The idea is > the app implements this, and IndexWriter will pass each Document > through to it, and call on it to do flushing/merging. I added a > setter to IndexWriterConfig to enable it, but I think this would > really belong in Codec ... > I haven't tackled the read-side yet, though this is already usable > without that (ie, the app can just open its own files, read them, > etc.). > The random test case passes. > I think for example this might make it easier for Solr/ElasticSearch > to implement things like ExternalFileField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org