[ 
https://issues.apache.org/jira/browse/CURATOR-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Bae updated CURATOR-76:
---------------------------

    Description: We are having serious data corruption issue when we are 
rolling restart of zookeeper servers due to one application which is using 
ChildReaper recipe. I am not sure its root cause but my theory is, when the 
multiple instances are running ChildReaper recipe, they would conflict each 
other among checking exist and deleting paths. This conflict can cause data 
corruption. We observed all servers died due to corrupted data and we had to 
manually copy log/snapshot data and restart them.  (was: We are having serious 
data corruption issue when we are rolling restart of zookeeper servers due to 
one application which is using ChildReaper recipe. I am not sure its root cause 
but my theory is, when the multiple instances are running ChildReaper recipe, 
they would conflict each other among checking exist and deleting paths. This 
conflict can cause data corruption. We observed all servers died due to 
corrupted data and we had to manually copy log/snapshot data and restart them.

Also, it wouldn't be enough checking simply whether the zknode is empty. It 
would be better if ChildReaper is checking the node is empty and it's not 
modified for the amount of time.)
     Issue Type: Improvement  (was: Bug)
        Summary: Adding leader selection ChildReaper recipe  (was: Adding 
leader selection and TTL feature in ChildReaper recipe)

> Adding leader selection ChildReaper recipe
> ------------------------------------------
>
>                 Key: CURATOR-76
>                 URL: https://issues.apache.org/jira/browse/CURATOR-76
>             Project: Apache Curator
>          Issue Type: Improvement
>          Components: Recipes
>            Reporter: Jay Bae
>
> We are having serious data corruption issue when we are rolling restart of 
> zookeeper servers due to one application which is using ChildReaper recipe. I 
> am not sure its root cause but my theory is, when the multiple instances are 
> running ChildReaper recipe, they would conflict each other among checking 
> exist and deleting paths. This conflict can cause data corruption. We 
> observed all servers died due to corrupted data and we had to manually copy 
> log/snapshot data and restart them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to