[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

mosh (JIRA) Sat, 20 Oct 2018 23:19:11 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658100#comment-16658100
 ]


mosh commented on SOLR-12638:
-----------------------------

We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
and the collection being used has more than a single shard.
Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
When this document:
{code:javascript} {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} 
{code}
Is being updated:
{code:javascript}{"id": "20", "grand_children": {"add": [{"id": "21", 
"string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.

# If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
# Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
# The user provides the _root_, which is not the ideal case when it comes to 
user friendliness.

IMO the third option should be the last result, since it is the least user 
friendly out of the three options.
My only concern regarding the first two options are the performance hit it 
might cause.

WDYT [~dsmiley], [~caomanhdat]?

> Support atomic updates of nested/child documents for nested-enabled schema
> --------------------------------------------------------------------------
>
>                 Key: SOLR-12638
>                 URL: https://issues.apache.org/jira/browse/SOLR-12638
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>         Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

Reply via email to