[ 
https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794159#comment-13794159
 ] 

Shalin Shekhar Mangar commented on SOLR-5213:
---------------------------------------------

bq. A variation of the patch i uploaded here would be to 'rescue' (and id+hash 
log) any documents that would have been lost otherwise e.g. always put them in 
the first sub-shard, they don't belong there but at least that way they are not 
lost and could be analysed and dealt with later on.

Hmm, that is going to be difficult because we have features such as SOLR-5338. 
It is completely valid to have documents that do not fall into any hash range 
passed into SolrIndexSplitter.

> collections?action=SPLITSHARD parent vs. sub-shards numDocs
> -----------------------------------------------------------
>
>                 Key: SOLR-5213
>                 URL: https://issues.apache.org/jira/browse/SOLR-5213
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.4
>            Reporter: Christine Poerschke
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-5213.patch
>
>
> The problem we saw was that splitting a shard took a long time and at the end 
> of it the sub-shards contained fewer documents than the original shard.
> The root cause was eventually tracked down to the disappearing documents not 
> falling into the hash ranges of the sub-shards.
> Could SolrIndexSplitter split report per-segment numDocs for parent and 
> sub-shards, with at least a warning logged for any discrepancies (documents 
> falling into none of the sub-shards or documents falling into several 
> sub-shards)?
> Additionally, could a case be made for erroring out when discrepancies are 
> detected i.e. not proceeding with the shard split? Either to always error or 
> to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD 
> action.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to