[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Shai Erera (JIRA) Thu, 20 Dec 2012 13:29:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537395#comment-13537395
 ]


Shai Erera commented on LUCENE-4272:
------------------------------------

bq. Especially the impact on mean average precision.

I'll focus on performance first because I think that we should give a good 
solution for DOCS_ONLY type of fields.

Also, constructing a test which can reliably check the effect on MAP is not 
trivial. Maybe if e.g. I replace the entire content field, or some part of it.

But, to measure MAP I'd need to use the TREC (GOV, GOV2) collection, for which 
I have judgements. But then I believe I'm the only one that can run the test? 
Unless anyone else has access to that collection? Do you know of any other open 
collection with judgements that I can use?

Not saying that it's not important to measure, but to me that comes second in 
the list, at least for the first step of field updates.
                
> another idea for updatable fields
> ---------------------------------
>
>                 Key: LUCENE-4272
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4272
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing 
> from stored fields, I think they solve a lot of the problem: users don't have 
> to "rebuild" their document from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the 
> unaffected fields.
> The basic idea is that we would require term vectors for this approach (as 
> the already store a serialized indexed version of the doc), and so we could 
> just take the other pieces from the existing vectors for the doc.
> I think we would have to extend vectors to also store the norm (so we dont 
> recompute that), and payloads, but it seems feasible at a glance.
> I dont think we should discard the idea because vectors are slow/big today, 
> this seems like something we could fix.
> Personally I like the idea of not slowing down search performance to solve 
> the problem, I think we should really start from that angle and work towards 
> making the indexing side more efficient, not vice-versa.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Reply via email to