[jira] [Updated] (KUDU-1020) ksck with snapshot reports divergence even if a server is just behind

Todd Lipcon (JIRA) Thu, 08 Sep 2016 19:47:40 -0700

     [ 
https://issues.apache.org/jira/browse/KUDU-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated KUDU-1020:
------------------------------
       Assignee:     (was: Todd Lipcon)
    Code Review:   (was: http://gerrit.sjc.cloudera.com:8080/#/c/7706/4)

> ksck with snapshot reports divergence even if a server is just behind
> ---------------------------------------------------------------------
>
>                 Key: KUDU-1020
>                 URL: https://issues.apache.org/jira/browse/KUDU-1020
>             Project: Kudu
>          Issue Type: Improvement
>          Components: ksck
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>
> Something seems to be wrong about how ksck handles checksum timestamps. I 
> have a recently-restarted cluster, and I ran ksck. One of the tablets has a 
> replica which was "lost" -- ie it fell too far behind and therefore could 
> never be caught up. ksck is just reporting it as a bad checksum. Shouldn't it 
> instead try to wait until the provided timestamp is "safe", and if the wait 
> times out, give an error that it's too far behind?
> As a stopgap, maybe we could have ksck also include the latest opid in the 
> error printout, to make it more obvious that a server is just "behind" and 
> not divergent?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KUDU-1020) ksck with snapshot reports divergence even if a server is just behind

Reply via email to