[jira] [Commented] (SOLR-7490) Update by query feature

Erick Erickson (JIRA) Thu, 30 Apr 2015 15:11:31 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522371#comment-14522371
 ]


Erick Erickson commented on SOLR-7490:
--------------------------------------

bq: Considering a query that qualifies everything, Solr ends up re-importing 
the whole data from itself which is basically an optimize operation I think

Not at all. An optimize does, not for instance, re-analyze all of the 
documents. How could it? Unless the data has stored="true", the original 
content is just _gone_. It just copies some binary bits around, a much simpler 
task. Perhaps not fast on a large corpus, but much faster then re-analyzing 
everything.


bq: With atomic updates, as you say, we will be exposing the freedom of 
updating a huge set of documents in one request. We will be pushing Solr too 
much unless it is used wisely.

Not really the same thing at all IMO. It's much less surprising to write a 
program that re-indexes a bunch of data than to write a single statement that's 
the equivalent of SQL "update blah where blah" and doesn't return for, perhaps, 
hours.

bq: But it seems to make it easy to change the schema without having to do 
anything after (basically change the schema and issue an update by query 
qualifying the whole index) which basically supports uptime re-indexing of a 
solr collection with new schema I guess.

I think you're still missing the point. There's no data to re-index _from_ 
unless the fields have stored="true".

bq: But it seems to make it easy to change the schema without having to do 
anything after (basically change the schema and issue an update by query 
qualifying the whole index) which basically supports uptime re-indexing of a 
solr collection with new schema I guess.

On any large size corpus, this will essentially have down-time. Your server 
will be so hammered that it won't be serving any queries. Or at least not 
quickly. But it is an interesting idea _if_ (and only if) you have all the data 
stored.

If you're making the argument that _if_ all fields are stored and _if_ you want 
to update a particular value for all docs that satisfy a query and _if_ you're 
willing to accept the risk of huge operations, then the work difference between 
a update-by-query and atomic updates is roughly equal, I'll agree with you. But 
frankly the benefit is very marginal in my view, so specialized that I'd be 
reluctant to push it forward.

Feel free to disagree of course, maybe others have a different opinion. 

> Update by query feature
> -----------------------
>
>                 Key: SOLR-7490
>                 URL: https://issues.apache.org/jira/browse/SOLR-7490
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Praneeth
>            Priority: Minor
>
> An update feature similar to the {{deleteByQuery}} would be very useful. Say, 
> the user wants to update a field of all documents in the index that match a 
> given criteria. I have encountered this use case in my project and it looks 
> like it could be a useful first class solr/lucene feature. I want to check if 
> this is something we would want to support in coming releases of Solr and 
> Lucene, are there scenarios that will prevent us from doing this, 
> feasibility, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7490) Update by query feature

Reply via email to