[ https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-4272: -------------------------------- Description: I've been reviewing the ideas for updatable fields and have an alternative proposal that I think would address my biggest concern: * not slowing down searching When I look at what Solr and Elasticsearch do here, by basically reindexing from stored fields, I think they solve a lot of the problem: users don't have to "rebuild" their document from scratch just to update one tiny piece. But I think we can do this more efficiently: by avoiding reindexing of the unaffected fields. The basic idea is that we would require term vectors for this approach (as the already store a serialized indexed version of the doc), and so we could just take the other pieces from the existing vectors for the doc. I dont think we should discard the idea because vectors are slow/big today, this seems like something we could fix. Personally I like the idea of not slowing down search performance to solve the problem, I think we should really start from that angle and work towards making the indexing side more efficient, not vice-versa. was: I've been reviewing the ideas for updatable fields and have an alternative proposal that I think would address my biggest concern: * not slowing down searching When I look at what Solr and Elasticsearch do here, by basically reindexing from stored fields, I think they solve a lot of the problem: users don't have to "rebuild" their document from scratch just to update one tiny piece. But I think we can do this more efficiently: by avoiding reindexing of the unaffected fields. The basic idea is that we would require term vectors for this approach (as the already store a serialized indexed version of the doc), and so we could just take the other pieces from the existing vectors for the doc. I think we would have to extend vectors to also store the norm (so we dont recompute that), and payloads, but it seems feasible at a glance. I dont think we should discard the idea because vectors are slow/big today, this seems like something we could fix. Personally I like the idea of not slowing down search performance to solve the problem, I think we should really start from that angle and work towards making the indexing side more efficient, not vice-versa. edit: just to make it clear we dont need to change the index format if we wnt to implement this: its "just code". norms for unaffected fields can be reused as-is. for the affected fields when digesting the Terms, we could just process them as normal. > another idea for updatable fields > --------------------------------- > > Key: LUCENE-4272 > URL: https://issues.apache.org/jira/browse/LUCENE-4272 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > > I've been reviewing the ideas for updatable fields and have an alternative > proposal that I think would address my biggest concern: > * not slowing down searching > When I look at what Solr and Elasticsearch do here, by basically reindexing > from stored fields, I think they solve a lot of the problem: users don't have > to "rebuild" their document from scratch just to update one tiny piece. > But I think we can do this more efficiently: by avoiding reindexing of the > unaffected fields. > The basic idea is that we would require term vectors for this approach (as > the already store a serialized indexed version of the doc), and so we could > just take the other pieces from the existing vectors for the doc. > I dont think we should discard the idea because vectors are slow/big today, > this seems like something we could fix. > Personally I like the idea of not slowing down search performance to solve > the problem, I think we should really start from that angle and work towards > making the indexing side more efficient, not vice-versa. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org