[
https://issues.apache.org/jira/browse/KUDU-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin resolved KUDU-950.
--------------------------------
Fix Version/s: 1.0.0
Resolution: Fixed
I guess this has been fixed long time ago with:
*
[2525ad094234e6bc901b8bc544801ca00e8f411e|https://github.com/apache/kudu/commit/2525ad094234e6bc901b8bc544801ca00e8f411e]
*
[a5a192a48f12cc8ae87ad3c7568d41bf6e657d0b|https://github.com/apache/kudu/commit/a5a192a48f12cc8ae87ad3c7568d41bf6e657d0b]
> Possible race in Master lifecycle
> ---------------------------------
>
> Key: KUDU-950
> URL: https://issues.apache.org/jira/browse/KUDU-950
> Project: Kudu
> Issue Type: Bug
> Components: master
> Affects Versions: Public beta
> Reporter: Mike Percy
> Assignee: Mike Percy
> Priority: Major
> Fix For: 1.0.0
>
>
> It looks like there is a startup race in the master. We should likely just
> implement the same bind-then-listen type startup logic on the Master that we
> use in the TS to protect against this.
> I saw this failure in external_mini_cluster-test on gerrit @
> http://sandbox.jenkins.cloudera.com/job/kudu-gerrit/9359/BUILD_TYPE=RELEASE,label=kudu-gerrit-slaves/:
> {noformat}
> I0807 02:53:37.758509 32373 external_mini_cluster.cc:508] Running
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-master
> kudu-master
> --master_wal_dir=/data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373/minicluster-data/master-0
> --master_data_dirs=/data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373/minicluster-data/master-0
> --master_rpc_bind_addresses=127.0.0.1:11010
> --webserver_interface=localhost
> --master_web_port=40946
> --metrics_log_interval_ms=1000
> --log_dir=/data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373/minicluster-data/master-0
> --master_addresses=127.0.0.1:11010,127.0.0.1:11011,127.0.0.1:11012
> --enable_leader_failure_detection=true
> --server_dump_info_path=/data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373/minicluster-data/master-0/info.pb
> --server_dump_info_format=pb
> --logtostderr
> --logbuflevel=-1
> I0807 02:53:37.772274 453 mem_tracker.cc:98] MemTracker: hard memory limit
> is 23.515860 GB
> I0807 02:53:37.772506 453 mem_tracker.cc:100] MemTracker: soft memory limit
> is 14.109516 GB
> I0807 02:53:37.774209 453 master_main.cc:27] Initializing master server...
> I0807 02:53:37.776566 453 fs_manager.cc:200] Opened local filesystem:
> /data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373/minicluster-data/master-0
> uuid: "5add707ecfe54a4d8cd9bde9768ebe8f"
> format_stamp: "Formatted at 2015-08-07 09:53:29 on
> boost-static-burst-slave-0b55.vpc.cloudera.com"
> I0807 02:53:37.779443 453 hybrid_clock.cc:122] HybridClock initialized.
> Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current
> error: 478950
> I0807 02:53:37.779597 453 master_main.cc:30] Starting Master server...
> I0807 02:53:37.782593 453 rpc_server.cc:125] RPC server started. Bound to:
> 127.0.0.1:11010
> I0807 02:53:37.782708 453 webserver.cc:121] Starting webserver on
> localhost:40946
> I0807 02:53:37.782762 453 webserver.cc:130] Document root disabled
> I0807 02:53:37.783236 453 webserver.cc:213] Webserver started. Bound to:
> http://127.0.0.1:40946/
> F0807 02:53:37.799149 472 catalog_manager.cc:1513] Check failed:
> sys_catalog_.get() != NULL sys_catalog_ must be initialized!
> *** Check failure stack trace: ***
> @ 0x72776d google::LogMessage::Fail() at
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:234
> @ 0x72bc4d google::LogMessage::SendToLog() at
> /usr/include/boost/random/mersenne_twister.hpp:251
> @ 0x729abb google::LogMessage::Flush() at
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/thirdparty/installed/include/boost/uuid/seed_rng.hpp:81
> @ 0x729de1 google::LogMessageFatal::~LogMessageFatal() at
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_ios.h:452
> @ 0x702bdb kudu::master::CatalogManager::GetTabletPeer() at
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/src/kudu/gutil/ref_counted.h:277
> @ 0x76d784
> kudu::tserver::ConsensusServiceImpl::UpdateConsensus() at
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/thirdparty/gperftools-2.2.1/src/heap-profiler.cc:322
> @ 0x1164edf kudu::consensus::ConsensusServiceIf::Handle() at
> ??:0
> @ 0x11ce978 kudu::rpc::ServicePool::RunThread() at ??:0
> @ 0x128b90f kudu::Thread::SuperviseThread() at ??:0
> @ 0x7f675d918851 start_thread at ??:0
> @ 0x7f675cb8a94d clone at ??:0
> @ (nil) (unknown)
> W0807 02:53:38.121330 32475 consensus_peers.cc:247] T
> 00000000000000000000000000000000 P 82f86f33a42a4b14a8b7f1ce307e0976 -> Peer
> 5add707ecfe54a4d8cd9bde9768ebe8f (127.0.0.1:11010): Couldn't send request to
> peer 5add707ecfe54a4d8cd9bde9768ebe8f for tablet
> 00000000000000000000000000000000 Status: Network error: Recv() got EOF from
> remote (error 108). Retrying in the next heartbeat period. Already tried 1
> times.
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/src/kudu/integration-tests/external_mini_cluster-test.cc:92:
> Failure
> Failed
> Bad status: Runtime error: Process exited with rc=134:
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-master
> I0807 02:53:38.127594 32373 external_mini_cluster.cc:597] Killing
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-master
> with pid 32422
> I0807 02:53:38.130091 32373 external_mini_cluster.cc:597] Killing
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-master
> with pid 32467
> I0807 02:53:38.135949 32373 external_mini_cluster.cc:597] Killing
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-tablet_server
> with pid 32512
> I0807 02:53:38.139133 32373 external_mini_cluster.cc:597] Killing
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-tablet_server
> with pid 32660
> I0807 02:53:38.141916 32373 external_mini_cluster.cc:597] Killing
> /data1/jenkins-workspace/kudu-gerrit/BUILD_TYPE/RELEASE/label/kudu-gerrit-slaves/build/release/kudu-tablet_server
> with pid 321
> I0807 02:53:38.145797 32373 test_util.cc:56]
> -----------------------------------------------
> I0807 02:53:38.145820 32373 test_util.cc:57] Had fatal failures, leaving test
> files at
> /data1/test-tmp/external_mini_cluster-test.EMCTest.TestBasicOperation.1438941209613987-32373
> [ FAILED ] EMCTest.TestBasicOperation (8531 ms)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)