[ 
https://issues.apache.org/jira/browse/SOLR-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363757#comment-14363757
 ] 

Paolo Cappuccini edited comment on SOLR-7247 at 3/16/15 7:34 PM:
-----------------------------------------------------------------

Thanks Shalin! 
I finally understood better splitting behaviour.
I did further investigation and i found the real reason of my problems.

After splitting i have obvsiouly new distribution of docs in shards.
The reason because i didn't find documents is in RealTimeGetComponent.java 
(line 366) :

Slice slice = coll.getRouter().getTargetSlice(id, null, params, coll);

In this case "nobody" consider routeField and it should be impossible to 
consider : at that time is not possible to get the value of route field.

Also the sliceHash function in CompositeIdRouter doesn't consider _route_ field 
in params. So the document is lost and passing explicit "\_route_" field is not 
useful.

Around same behaviour is in DistributedUpdateProcessor in case of 
"processDelete".

The behaviour is so strange that perhaps i am completely wrong!!!!!!

I think that CompositeIdRouter.sliceHash sliceHash could have explicit 
overloads to hash by "doc"/"collection" or hash by "value" (like in 
IndexSplitter")

getTargetSlice itself should have same overloads (actually it has same ambigous 
signature then sliceHash ).

RealtimeGetComponent can only "think" by id (and not by routeField) so it 
should consider all active slices if routeField is specified for collection; a 
good optimization for these case could be to consider "\_route_" param to route 
specific shard.

About "processDelete" any solution look very complicate but in general, if i'm 
not wrong, routeField break something.



was (Author: cappuccini):
Thanks Shalin! 
I finally understood better splitting behaviour.
I did further investigation and i found the real reason of my problems.

After splitting i have obvsiouly new distribution of docs in shards.
The reason because i didn't find documents is in RealTimeGetComponent.java 
(line 366) :

Slice slice = coll.getRouter().getTargetSlice(id, null, params, coll);

In this case "nobody" consider routeField and it should be impossible to 
consider at that time is not possible to get the value of route field.

Also the sliceHash function in CompositeIdRouter doesn't consider _route_ field 
in params. So the document is lost and passing explicit _route_ field is not 
useful.

Around same behaviour is in DsitributedUpdateProcessor in case of 
"processDelete".

The behaviour is so strange that perhaps i am completely wrong!!!!!!

I think that CompositeIdRouter.sliceHash sliceHash could have explicit 
overloads to hash by "doc"/"collection" or hash by "value" (like in 
IndexSplitter")

getTargetSlice itself should have same overloads (actually it has same ambigous 
signature then sliceHash ).

RealtimeGetComponent can only "think" by id (and not by routeField) so it 
should consider all active slices if routeField is specified; a good 
optimization for these case could be to consider "_route_" param to route 
specific shard.

About "processDelete" any solution look very complicate but in general, if i'm 
not wrong, routeField break something.


> sliceHash for compositeIdRouter is not coherent with routing
> ------------------------------------------------------------
>
>                 Key: SOLR-7247
>                 URL: https://issues.apache.org/jira/browse/SOLR-7247
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.10.3
>            Reporter: Paolo Cappuccini
>
> in CompositeIdRouter the function sliceHash check routeField configured for 
> collection.
> This make me to guess that intended behaviour is manage alternative field to  
> id field to hash documents.
> But the signature of this method is very general ( can take id, doc or 
> params) and it is used in different ways from different functionality.
> In my opinion it should have overloads instead of a weak internal logic. One 
> overload with "doc" and "collection" and another one with "id" , "params" and 
> "collections".
> In any case , if "\_route_" is not available by "params" , "collection" 
> should be mandatory and in case of RouteField, also "doc" should be mandatory.
> This will break SplitIndex but it will save coherence of data.
> If i configure routeField i noticed that is broken the DeleteCommand (this 
> pass to sliceHash only "id" and "params" ) and SolrIndexSplitter ( this pass 
> only "id" )
> It should be forbidden to specify RouteField to compositeIdRouter or 
> implements related functionality to make possible to hash documents based on 
> RouteField.
> in case of DeleteCommand command the workaround is to specify "_route_" param 
> in request but in case of Index Splitting is not possible any workaround.
> In this case it should be passed entire document during splitting ("doc" 
> parameter") or build params with proper "\_route_" parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to