[
https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-4272:
--------------------------------
Description:
I've been reviewing the ideas for updatable fields and have an alternative
proposal that I think would address my biggest concern:
* not slowing down searching
When I look at what Solr and Elasticsearch do here, by basically reindexing
from stored fields, I think they solve a lot of the problem: users don't have
to "rebuild" their document from scratch just to update one tiny piece.
But I think we can do this more efficiently: by avoiding reindexing of the
unaffected fields.
The basic idea is that we would require term vectors for this approach (as the
already store a serialized indexed version of the doc), and so we could just
take the other pieces from the existing vectors for the doc.
I dont think we should discard the idea because vectors are slow/big today,
this seems like something we could fix.
Personally I like the idea of not slowing down search performance to solve the
problem, I think we should really start from that angle and work towards making
the indexing side more efficient, not vice-versa.
was:
I've been reviewing the ideas for updatable fields and have an alternative
proposal that I think would address my biggest concern:
* not slowing down searching
When I look at what Solr and Elasticsearch do here, by basically reindexing
from stored fields, I think they solve a lot of the problem: users don't have
to "rebuild" their document from scratch just to update one tiny piece.
But I think we can do this more efficiently: by avoiding reindexing of the
unaffected fields.
The basic idea is that we would require term vectors for this approach (as the
already store a serialized indexed version of the doc), and so we could just
take the other pieces from the existing vectors for the doc.
I think we would have to extend vectors to also store the norm (so we dont
recompute that), and payloads, but it seems feasible at a glance.
I dont think we should discard the idea because vectors are slow/big today,
this seems like something we could fix.
Personally I like the idea of not slowing down search performance to solve the
problem, I think we should really start from that angle and work towards making
the indexing side more efficient, not vice-versa.
edit: just to make it clear we dont need to change the index format if we wnt
to implement this: its "just code".
norms for unaffected fields can be reused as-is. for the affected fields when
digesting the Terms, we could just process them as normal.
> another idea for updatable fields
> ---------------------------------
>
> Key: LUCENE-4272
> URL: https://issues.apache.org/jira/browse/LUCENE-4272
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Robert Muir
>
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing
> from stored fields, I think they solve a lot of the problem: users don't have
> to "rebuild" their document from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the
> unaffected fields.
> The basic idea is that we would require term vectors for this approach (as
> the already store a serialized indexed version of the doc), and so we could
> just take the other pieces from the existing vectors for the doc.
> I dont think we should discard the idea because vectors are slow/big today,
> this seems like something we could fix.
> Personally I like the idea of not slowing down search performance to solve
> the problem, I think we should really start from that angle and work towards
> making the indexing side more efficient, not vice-versa.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]