[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Shai Erera (JIRA) Thu, 20 Dec 2012 03:19:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536961#comment-13536961
 ]


Shai Erera commented on LUCENE-4272:
------------------------------------

That's an interesting idea Robert. I agree that (1) is sometimes more expensive 
than re-indexing and I'll admit that in the cases I've seen, fetching docs from 
the DB was a huge bottleneck, because the DB was used for many other 
application transactions, while search was not the majority of transactions. 
Also, (2) is not so cheap either. So I agree your approach would keep the users 
with (3) only.

There is a downside to this approach, that it requires the app to store 
everything in the index too (in addition to the DB). Even if it's just term 
vectors, that's still extra storage. I know that for large applications, the 
index stores the minimal set of fields that are required to build the search 
results. For really large apps, the content isn't even there, but rather the 
search snippets are computed on a different cluster.
Just want to point that out. It may not be a big deal to small applications ... 
but then reindexing documents when you have a small application isn't a big 
deal either ...

I also think that your approach may not work well for apps with relatively high 
frequency of tiny updates? I mean, today they need to re-index the entire 
document, doing steps 1-3 and with your approach they'll need to do just #3. 
But in the approach on LUCENE-4258, the cost of indexing an update is 
proportional to the size of the update? We still don't know the impact on the 
search side, but we know for sure that if updates are frequently merged down to 
the segment (a'la expunge deletes), there is no effect on search?

Perhaps what we should do on LUCENE-4258 is run a benchmark on an index w/ low, 
mid and high number of updates and measure the impact on search.
                
> another idea for updatable fields
> ---------------------------------
>
>                 Key: LUCENE-4272
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4272
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing 
> from stored fields, I think they solve a lot of the problem: users don't have 
> to "rebuild" their document from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the 
> unaffected fields.
> The basic idea is that we would require term vectors for this approach (as 
> the already store a serialized indexed version of the doc), and so we could 
> just take the other pieces from the existing vectors for the doc.
> I think we would have to extend vectors to also store the norm (so we dont 
> recompute that), and payloads, but it seems feasible at a glance.
> I dont think we should discard the idea because vectors are slow/big today, 
> this seems like something we could fix.
> Personally I like the idea of not slowing down search performance to solve 
> the problem, I think we should really start from that angle and work towards 
> making the indexing side more efficient, not vice-versa.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Reply via email to