[jira] [Commented] (KUDU-1933) Master crashes after too many TS re-registrations

Mike Percy (JIRA) Mon, 13 Mar 2017 16:37:54 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923180#comment-15923180
 ]


Mike Percy commented on KUDU-1933:
----------------------------------

The relevant stack trace looks like this:

{noformat}
#6  0x000000000087153f in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x0000000000ae1018 in kudu::log::LogIndex::GetChunkForIndex(long, bool, 
scoped_refptr<kudu::log::LogIndex::IndexChunk>*) ()
#8  0x0000000000ae118f in 
kudu::log::LogIndex::AddEntry(kudu::log::LogIndexEntry const&) ()
#9  0x0000000000ad5f47 in 
kudu::log::Log::UpdateIndexForBatch(kudu::log::LogEntryBatch const&, long) ()
#10 0x0000000000adbdc9 in kudu::log::Log::DoAppend(kudu::log::LogEntryBatch*) ()
#11 0x0000000000addba2 in kudu::log::Log::Append(kudu::log::LogEntryPB*) ()
#12 0x00000000009918b0 in 
kudu::tablet::TabletBootstrap::HandleReplicateMessage(kudu::tablet::ReplayState*,
 kudu::log::LogEntryPB*) ()
#13 0x0000000000995f09 in 
kudu::tablet::TabletBootstrap::HandleEntry(kudu::tablet::ReplayState*, 
kudu::log::LogEntryPB*) ()
#14 0x00000000009967f7 in 
kudu::tablet::TabletBootstrap::PlaySegments(kudu::consensus::ConsensusBootstrapInfo*)
 ()
#15 0x0000000000998a3b in 
kudu::tablet::TabletBootstrap::Bootstrap(std::shared_ptr<kudu::tablet::Tablet>*,
 scoped_refptr<kudu::log::Log>*, kudu::consensus::ConsensusBootstrapInfo*) ()
#16 0x0000000000999217 in 
kudu::tablet::BootstrapTablet(scoped_refptr<kudu::tablet::TabletMetadata> 
const&, scoped_refptr<kudu::server::Clock> const&, 
std::shared_ptr<kudu::MemTracker> const&, 
scoped_refptr<kudu::rpc::ResultTracker> const&, kudu::MetricRegistry*, 
kudu::tablet::TabletStatusListener*, std::shared_ptr<kudu::tablet::Tablet>*, 
scoped_refptr<kudu::log::Log>*, scoped_refptr<kudu::log::LogAnchorRegistry> 
const&, kudu::consensus::ConsensusBootstrapInfo*) ()
#17 0x0000000000827b26 in 
kudu::master::SysCatalogTable::SetupTablet(scoped_refptr<kudu::tablet::TabletMetadata>
 const&) ()
#18 0x000000000082a5da in kudu::master::SysCatalogTable::Load(kudu::FsManager*) 
()
#19 0x000000000083df99 in 
kudu::master::CatalogManager::InitSysCatalogAsync(bool) ()
#20 0x000000000083f9e1 in kudu::master::CatalogManager::Init(bool) ()
#21 0x00000000008117e5 in kudu::master::Master::InitCatalogManager() ()
#22 0x00000000008118ba in kudu::master::Master::InitCatalogManagerTask() ()
{noformat}

> Master crashes after too many TS re-registrations
> -------------------------------------------------
>
>                 Key: KUDU-1933
>                 URL: https://issues.apache.org/jira/browse/KUDU-1933
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Jean-Daniel Cryans
>
> I had a cluster with mis-matched versions inside the 1.3 release (something 
> no one would see using released versions) and ended up with the tablet 
> servers constantly retrying to register with the master. After a few days of 
> this, the master died this way:
> {noformat}
> I0308 00:25:47.038650  7619 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "d8009e07d82b4e66a7ab50f85e60bc30" 
> instance_seqno: 1487888450146835
> I0308 00:25:47.038702  7619 ts_manager.cc:84] Re-registered known tserver 
> with Master: d8009e07d82b4e66a7ab50f85e60bc30 (ve0136.halxg.cloudera.com:7050)
> I0308 00:25:47.043874  7616 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "335d132897de4bdb9b87443f2c487a42" 
> instance_seqno: 1487888474889244
> I0308 00:25:47.043912  7616 ts_manager.cc:84] Re-registered known tserver 
> with Master: 335d132897de4bdb9b87443f2c487a42 (ve0126.halxg.cloudera.com:7050)
> I0308 00:25:47.108677  7617 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "7425c65d80f54f2da0a85494a5eb3e68" 
> instance_seqno: 1487888491433564
> I0308 00:25:47.108719  7617 ts_manager.cc:84] Re-registered known tserver 
> with Master: 7425c65d80f54f2da0a85494a5eb3e68 (ve0122.halxg.cloudera.com:7050)
> I0308 00:25:47.111563  7611 ts_descriptor.cc:125] Processing retry of TS 
> registration from permanent_uuid: "c108a85a68504c2bb9f49e4ee683d981" 
> instance_seqno: 1487888392795318
> I0308 00:25:47.111604  7611 ts_manager.cc:84] Re-registered known tserver 
> with Master: c108a85a68504c2bb9f49e4ee683d981 (ve0128.halxg.cloudera.com:7050)
> F0308 00:25:53.568773  7655 log_index.cc:171] Check failed: log_index > 0 
> (-2147483648 vs. 0) 
> {noformat}
> Ideally the master shouldn't crash, but it also sounds like we're not 
> handling log_index overflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KUDU-1933) Master crashes after too many TS re-registrations

Reply via email to