[jira] [Commented] (KUDU-1933) OpId index 32-bit overflow (was: Master crashes after too many TS re-registrations)

Jean-Daniel Cryans (JIRA) Thu, 16 Mar 2017 12:57:06 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928753#comment-15928753
 ]


Jean-Daniel Cryans commented on KUDU-1933:
------------------------------------------

bq. As reported in the duplicate JIRA KUDU-1933

I think you meant KUDU-1936.

Also, as a note for this jira, the latest version of the patch that's under 
review auto-fixed my master. Yay! :)

> OpId index 32-bit overflow (was: Master crashes after too many TS 
> re-registrations)
> -----------------------------------------------------------------------------------
>
>                 Key: KUDU-1933
>                 URL: https://issues.apache.org/jira/browse/KUDU-1933
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, master, tserver
>    Affects Versions: 1.3.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Mike Percy
>            Priority: Critical
>
> I had a cluster with mis-matched versions inside the 1.3 release (something 
> no one would see using released versions) and ended up with the tablet 
> servers constantly retrying to register with the master. After a few days of 
> this, the master died this way:
> {noformat}
> I0308 00:25:47.038650  7619 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "d8009e07d82b4e66a7ab50f85e60bc30" 
> instance_seqno: 1487888450146835
> I0308 00:25:47.038702  7619 ts_manager.cc:84] Re-registered known tserver 
> with Master: d8009e07d82b4e66a7ab50f85e60bc30 (ve0136.halxg.cloudera.com:7050)
> I0308 00:25:47.043874  7616 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "335d132897de4bdb9b87443f2c487a42" 
> instance_seqno: 1487888474889244
> I0308 00:25:47.043912  7616 ts_manager.cc:84] Re-registered known tserver 
> with Master: 335d132897de4bdb9b87443f2c487a42 (ve0126.halxg.cloudera.com:7050)
> I0308 00:25:47.108677  7617 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "7425c65d80f54f2da0a85494a5eb3e68" 
> instance_seqno: 1487888491433564
> I0308 00:25:47.108719  7617 ts_manager.cc:84] Re-registered known tserver 
> with Master: 7425c65d80f54f2da0a85494a5eb3e68 (ve0122.halxg.cloudera.com:7050)
> I0308 00:25:47.111563  7611 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "c108a85a68504c2bb9f49e4ee683d981" 
> instance_seqno: 1487888392795318
> I0308 00:25:47.111604  7611 ts_manager.cc:84] Re-registered known tserver 
> with Master: c108a85a68504c2bb9f49e4ee683d981 (ve0128.halxg.cloudera.com:7050)
> F0308 00:25:53.568773  7655 log_index.cc:171] Check failed: log_index > 0 
> (-2147483648 vs. 0) 
> {noformat}
> Ideally the master shouldn't crash, but it also sounds like we're not 
> handling log_index overflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KUDU-1933) OpId index 32-bit overflow (was: Master crashes after too many TS re-registrations)

Reply via email to