[jira] [Comment Edited] (SOLR-5944) Support updates of numeric DocValues

Ishan Chattopadhyaya (JIRA) Sat, 29 Oct 2016 00:30:07 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617617#comment-15617617
 ]


Ishan Chattopadhyaya edited comment on SOLR-5944 at 10/29/16 7:29 AM:
----------------------------------------------------------------------

Just discovered another, more common, problem with reordered DBQs and in-place 
updates working together. The earlier discussed problem, of resurrecting a 
document, is very similar. So, here's a description of both:

SCENARIO 1:
{code}
Imagine, updates on leader are:
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)
DBQ     (q="updateable_field:1", version=300)

The same on the replica (forwarded):
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
DBQ     (q="updateable_field:1", version=300)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)

The expected net effect is that no document is deleted, and id=1 document 
exists with updateable_field=2.
Here, the DBQ was reordered. When they are executed on the replica, the 
version=200 update cannot be applied since there is no document with 
(id=1,prevVersion=100). What is required is a resurrection of the document that 
was deleted by the DBQ, so that other stored/indexed fields are not lost.
{code}

SCENARIO 2:
{code}
Imagine, updates on leader are:
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)
DBQ     (q="id:1", version=300)

The same on the replica (forwarded):
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
DBQ     (q="id:1", version=300)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)

The expected net effect is that the document with id=1 be deleted. But again, 
the DBQ is reordered. When executed on replica, update version=200 cannot be 
applied, since the id=1 document has been deleted. What is required is for this 
update (version=200) to be dropped silently.
{code}

Scenario 1 is rare, scenario 2 would be more common. At the point when the 
inplace update (version=200 in both cases) is applied, the replica has no way 
to know if the update requires a resurrection of the document, or requires to 
be dropped.

Till now, I hadn't considered scenario 2, but for the rare scenario 1, I 
resorted to throwing an error so as to throw the replica in LIR. Clearly, in 
view of scenario 2, this looks like a bad idea. Here are two potential 
solutions that come to mind:
Solution 1:
{code}
In a replica, while applying an in-place update, if the required prevVersion 
update cannot be found in tlog or index (due to these reordered DBQs), then 
fetch from the leader an update that contains the full document with the 
version (for which the update failed at replica). If it has been deleted on 
leader, just drop it on replica silently. Downside to this approach is that 
unstored/non-dv fields will get dropped (as is the case with regular atomic 
updates today).
{code}
Solution 2:
{code}
Ensure that DBQs are never reordered from leader -> replica. One approach can 
be SOLR-8148. Another could be to block, on the leader, all updates newer than 
a DBQ until the DBQ is processed on leader and all the replicas, and only then 
process the other updates. Also, block the DBQ and execute it only after all 
updates older than the DBQ have been processed on leader and all the replicas.
{code}
Solution 1 seems easier to implement now than solution 2, but solution 2 (if 
implemented correctly) seems cleaner. Any thoughts?

Edit: There's a third solution in the interim:
{code}
Have a field definition flag, inplace-updateable=true, or a similar schema 
level property, to enable or disable this feature (of updating docValues). This 
feature can be turned off by default (and this default can be revisited in a 
later major release). But someone can turn it on, if he/she agrees to (a) 
ensure they don't issue DBQs on updated documents or, even if they do that, (b) 
they make sure their DBQs are not reordered.
{code}
Not an ideal solution, but this could be in the spirit of "progress, not 
perfection".


was (Author: ichattopadhyaya):

Just discovered another, more common, problem with reordered DBQs and in-place 
updates working together. The earlier discussed problem, of resurrecting a 
document, is very similar. So, here's a description of both:

SCENARIO 1:
{code}
Imagine, updates on leader are:
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)
DBQ     (q="updateable_field:1", version=300)

The same on the replica (forwarded):
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
DBQ     (q="updateable_field:1", version=300)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)

The expected net effect is that no document is deleted, and id=1 document 
exists with updateable_field=2.
Here, the DBQ was reordered. When they are executed on the replica, the 
version=200 update cannot be applied since there is no document with 
(id=1,prevVersion=100). What is required is a resurrection of the document that 
was deleted by the DBQ, so that other stored/indexed fields are not lost.
{code}

SCENARIO 2:
{code}
Imagine, updates on leader are:
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)
DBQ     (q="id:1", version=300)

The same on the replica (forwarded):
ADD     (id=1, updateable_field=1, title="mydoc1", version=100)
DBQ     (q="id:1", version=300)
INP-UPD (id=1, updateable_field=2, version=200, prevVersion=100)

The expected net effect is that the document with id=1 be deleted. But again, 
the DBQ is reordered. When executed on replica, update version=200 cannot be 
applied, since the id=1 document has been deleted. What is required is for this 
update (version=200) to be dropped silently.
{code}

Scenario 1 is rare, scenario 2 would be more common. At the point when the 
inplace update (version=200 in both cases) is applied, the replica has no way 
to know if the update requires a resurrection of the document, or requires to 
be dropped.

Till now, I hadn't considered scenario 2, but for the rare scenario 1, I 
resorted to throwing an error so as to throw the replica in LIR. Clearly, in 
view of scenario 2, this looks like a bad idea. Here are two potential 
solutions that come to mind:
Solution 1:
{code}
In a replica, while applying an in-place update, if the required prevVersion 
update cannot be found in tlog or index (due to these reordered DBQs), then 
fetch from the leader an update that contains the full document with the 
version (for which the update failed at replica). Downside to this approach is 
that unstored/non-dv fields will get dropped (as is the case with regular 
atomic updates today).
{code}
Solution 2:
{code}
Ensure that DBQs are never reordered from leader -> replica. One approach can 
be SOLR-8148. Another could be to block, on the leader, all updates newer than 
a DBQ, that has been sent through a different thread, until the DBQ is 
processed on leader and all the replicas, and only then process the other 
updates.
{code}
Solution 1 seems easier to implement now than solution 2, but solution 2 (if 
implemented correctly) seems cleaner. Any thoughts?

Edit: There's a third solution in the interim:
{code}
Have a field definition flag, inplace-updateable=true, or a similar schema 
level property, to enable or disable this feature (of updating docValues). This 
feature can be turned off by default (and this default can be revisited in a 
later major release). But someone can turn it on, if he/she agrees to (a) 
ensure they don't issue DBQs on updated documents or, even if they do that, (b) 
they make sure their DBQs are not reordered.
{code}
Not an ideal solution, but this could be in the spirit of "progress, not 
perfection".

> Support updates of numeric DocValues
> ------------------------------------
>
>                 Key: SOLR-5944
>                 URL: https://issues.apache.org/jira/browse/SOLR-5944
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Shalin Shekhar Mangar
>         Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, defensive-checks.log.gz, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-5944) Support updates of numeric DocValues

Reply via email to