[ 
https://issues.apache.org/jira/browse/SOLR-11287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141571#comment-16141571
 ] 

Vannia Rajan commented on SOLR-11287:
-------------------------------------

I figured out when this issue happens, by observing the patterns with a small 
set of data.

SPLITSHARD issues a soft-commit (with some of the files still not fully written 
to disk). If I restart SOLR without issuing an explicit <commit />, the index 
directory is not fully written and the process is killed. During next restart, 
the incomplete index is set to 0 records and cleaned up.

I think we should update the documentations to let users know that they need to 
issue a hard <commit /> immediately after a SPLITSHARD operation.

> Sub-shards by SPLITSHARD loses data on restarting SOLR
> ------------------------------------------------------
>
>                 Key: SOLR-11287
>                 URL: https://issues.apache.org/jira/browse/SOLR-11287
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 5.5.1
>         Environment: Ubuntu 64-bit 32-core server, 240GB RAM
>            Reporter: Vannia Rajan
>
> We are running SOLR 5.5.1 with 4 nodes (1 shard per node). We are in the 
> process of splitting the 4 shards into 8 shards.
> The SPLITSHARD collections API works great - it does create the sub-shards 
> and activates sub-shards, inactivates the parent shard upon completion. The 
> row count compard with parent shard vs sub-shards are matching. However, the 
> data in sub-shards doesn't seem to be persistent in our case.
> A restart of SOLR leaves the sub-shards with 0 documents with their data 
> directory sizes getting reduced from 40+ GB to 71KB.
> If I'm missing any other steps to be followed after SPLITSHARD to make the 
> data in sub-shards persistent, please let me know. Otherwise, I feel this may 
> be a bug in v5.5.1.
> Note: I was able to manually set the states of parent to "active" and 
> children with 0 documents as "inactive" by setting 
> /collections/{collection}/state.json in zookeeper, to get back the lost data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to