[ https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536961#comment-13536961 ]
Shai Erera commented on LUCENE-4272: ------------------------------------ That's an interesting idea Robert. I agree that (1) is sometimes more expensive than re-indexing and I'll admit that in the cases I've seen, fetching docs from the DB was a huge bottleneck, because the DB was used for many other application transactions, while search was not the majority of transactions. Also, (2) is not so cheap either. So I agree your approach would keep the users with (3) only. There is a downside to this approach, that it requires the app to store everything in the index too (in addition to the DB). Even if it's just term vectors, that's still extra storage. I know that for large applications, the index stores the minimal set of fields that are required to build the search results. For really large apps, the content isn't even there, but rather the search snippets are computed on a different cluster. Just want to point that out. It may not be a big deal to small applications ... but then reindexing documents when you have a small application isn't a big deal either ... I also think that your approach may not work well for apps with relatively high frequency of tiny updates? I mean, today they need to re-index the entire document, doing steps 1-3 and with your approach they'll need to do just #3. But in the approach on LUCENE-4258, the cost of indexing an update is proportional to the size of the update? We still don't know the impact on the search side, but we know for sure that if updates are frequently merged down to the segment (a'la expunge deletes), there is no effect on search? Perhaps what we should do on LUCENE-4258 is run a benchmark on an index w/ low, mid and high number of updates and measure the impact on search. > another idea for updatable fields > --------------------------------- > > Key: LUCENE-4272 > URL: https://issues.apache.org/jira/browse/LUCENE-4272 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > > I've been reviewing the ideas for updatable fields and have an alternative > proposal that I think would address my biggest concern: > * not slowing down searching > When I look at what Solr and Elasticsearch do here, by basically reindexing > from stored fields, I think they solve a lot of the problem: users don't have > to "rebuild" their document from scratch just to update one tiny piece. > But I think we can do this more efficiently: by avoiding reindexing of the > unaffected fields. > The basic idea is that we would require term vectors for this approach (as > the already store a serialized indexed version of the doc), and so we could > just take the other pieces from the existing vectors for the doc. > I think we would have to extend vectors to also store the norm (so we dont > recompute that), and payloads, but it seems feasible at a glance. > I dont think we should discard the idea because vectors are slow/big today, > this seems like something we could fix. > Personally I like the idea of not slowing down search performance to solve > the problem, I think we should really start from that angle and work towards > making the indexing side more efficient, not vice-versa. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org