[ 
https://issues.apache.org/jira/browse/SOLR-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868183#comment-15868183
 ] 

Mano Kovacs commented on SOLR-10114:
------------------------------------

Thank you, [~mdrob], I did not know about ThworingRunnable.

bq. We might possibly want to hide this new functionality behind a version 
check? Does the patch apply relatively easily to 6.5 as well?
The patch relies on some changes of SOLR-5944, which is AFAIK will be 
backported too, however, I can create a 6.x patch too.

bq. Can you help me understand the full scope of the problem here - child docs 
are only in danger of spurious delete until the next commit point, right?
So the reordered DBQ could happen if an update with an earlier version arrives 
after a DBQ with a later version to the replicas, or vica-versa. Solr handles 
the two cases the following:
- If a DBQ arrives that has lower version than the latest updates, the DBQ gets 
an additional version filter to protect documents added earlier, with higher 
version.
-- If the DBQ is not by ID (or something limiting), but for example range or 
any, it will delete child-docs added with higher versioned parent doc. This is 
what the jira is originally about and 
{{testLogReplayWithReorderedDBQByAsterixAndChildDocs}} tests the case.
- If an update arrives that has lower version than the latest DBQs, the 
DirectUpdateHandler2 goes on an add-and-delete path, where the earlier DBQs 
with higher versions are replayed after the update.

Now, the {{doNormalUpdate(cmd)}} was checking if the document is block document 
(has children) and does two main differences based on that:
- Calls {{updateDocuments}} (plural) that accepts an Iterable and inserts every 
child document
- Builds idTerm by \_root\_ field, instead of id-field, so before adding the 
document, lucene would delete the parent AND the child documents as well.

On the other hand, addAndDelete() did not do any differentiation for block 
docs, resulting the child-nodes ignored during the inserts and overwrites.
So basically any reordered DBQ caused:
- Losing child-docs when new document was inserted 
({{testLogReplayWithReorderedDBQInsertingChildnodes}})
- Making the child-docs untouched on update. This caused replica numDocs 
inconsistency when the update contained different count of child-docs. 
({{testLogReplayWithReorderedDBQUpdateWithDifferentChildCount}})

So basically, any child-docs replication was dropped if there was a reordered 
DBQ.

bq. So if they make it to disk, even though they don't have versions, they are 
still safe from disappearing in the future.

AFAIK, the reordering cannot happen on the leader, this does not affects leader 
version, only replicas. I assume any peersync would fail due to fingerprint 
check, and would eventually replicate the correct index. [[email protected]], 
could you, please, verify my assumption?

> child documents lack _version_, susceptible to reordered delete-by-query 
> -------------------------------------------------------------------------
>
>                 Key: SOLR-10114
>                 URL: https://issues.apache.org/jira/browse/SOLR-10114
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yonik Seeley
>         Attachments: SOLR-10114.patch, SOLR-10114-validation.patch
>
>
> It looks like when a block of documents is indexed, child documents get no 
> \_version\_ field.  This means (among other potential issues) that a 
> delete-by-query that is reordered will cause matching child documents to be 
> deleted.  DBQ normally prevents deleting newer docs by including a 
> restriction on \_version\_, which doesn't work for anything lacking that 
> field.  Re-ordered delete-by-term of any child docs would also be affected 
> (although it should be a much rarer issue.)
> The leading candidate for a fix is to use the exact same \_version\_ for all 
> child docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to