[ 
https://issues.apache.org/jira/browse/KUDU-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo resolved KUDU-1527.
------------------------------
       Resolution: Duplicate
    Fix Version/s: 1.0.0

bq. This seems to be a duplicate of KUDU-1304

Whoops. Right you are.

> Python unit tests should wait for tservers to up and running
> ------------------------------------------------------------
>
>                 Key: KUDU-1527
>                 URL: https://issues.apache.org/jira/browse/KUDU-1527
>             Project: Kudu
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 0.9.1
>            Reporter: Adar Dembo
>             Fix For: 1.0.0
>
>
> Here is the output from a failed run of the Python unit tests:
> {noformat}
> 02:53:15 _____________ ERROR at setup of TestClient.test_capture_kudu_error 
> _____________
> 02:53:15 
> 02:53:15 cls = <class 'kudu.tests.test_client.TestClient'>
> 02:53:15 
> 02:53:15     @classmethod
> 02:53:15     def setUpClass(cls):
> 02:53:15         cls.cluster_path, master_port = cls.start_cluster()
> 02:53:15         time.sleep(1)
> 02:53:15     
> 02:53:15         cls.master_host = '127.0.0.1'
> 02:53:15         cls.master_port = master_port
> 02:53:15     
> 02:53:15         cls.client = kudu.connect(cls.master_host, cls.master_port)
> 02:53:15     
> 02:53:15         cls.schema = cls.example_schema()
> 02:53:15         cls.partitioning = cls.example_partitioning()
> 02:53:15     
> 02:53:15         cls.ex_table = 'example-table'
> 02:53:15         if cls.client.table_exists(cls.ex_table):
> 02:53:15             cls.client.delete_table(cls.ex_table)
> 02:53:15 >       cls.client.create_table(cls.ex_table, cls.schema, 
> cls.partitioning)
> 02:53:15 
> 02:53:15 kudu/tests/common.py:140: 
> 02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ 
> 02:53:15 kudu/client.pyx:249: in kudu.client.Client.create_table 
> (kudu/client.cpp:4819)
> 02:53:15     check_status(s)
> 02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ 
> 02:53:15 
> 02:53:15 >   raise KuduInvalidArgument(c_message)
> 02:53:15 E   KuduInvalidArgument: Error creating table example-table on the 
> master: Not enough live tablet servers to create a table with the requested 
> replication factor 1. 0 tablet servers are alive.
> 02:53:15 
> 02:53:15 kudu/errors.pyx:60: KuduInvalidArgument
> 02:53:15 ---------------------------- Captured stderr setup 
> -----------------------------
> 02:53:15 I0712 02:53:13.192479  7420 mem_tracker.cc:138] MemTracker: hard 
> memory limit is 94.627029 GB
> 02:53:15 I0712 02:53:13.192636  7420 mem_tracker.cc:140] MemTracker: soft 
> memory limit is 56.776218 GB
> 02:53:15 I0712 02:53:13.202077  7420 master_main.cc:58] Initializing master 
> server...
> 02:53:15 I0712 02:53:13.202827  7420 hybrid_clock.cc:178] HybridClock 
> initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 
> Current error: 969344
> 02:53:15 I0712 02:53:13.305677  7420 server_base.cc:164] Could not load 
> existing FS layout: Not found: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance:
>  No such file or directory (error 2)
> 02:53:15 I0712 02:53:13.305732  7420 server_base.cc:165] Creating new FS 
> layout
> 02:53:15 I0712 02:53:13.359714  7420 fs_manager.cc:345] Generated new 
> instance metadata in path 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance:
> 02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:13.414095  7420 fs_manager.cc:242] Opened local 
> filesystem: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data
> 02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:13.499505  7420 master_main.cc:61] Starting Master 
> server...
> 02:53:15 I0712 02:53:13.726842  7420 rpc_server.cc:164] RPC server started. 
> Bound to: 0.0.0.0:52533
> 02:53:15 I0712 02:53:13.727243  7420 webserver.cc:122] Starting webserver on 
> 0.0.0.0:0
> 02:53:15 I0712 02:53:13.727269  7420 webserver.cc:127] Document root: 
> /home/jenkins-slave/workspace/kudu-3/www
> 02:53:15 I0712 02:53:13.727638  7420 webserver.cc:217] Webserver started. 
> Bound to: http://0.0.0.0:47227/
> 02:53:15 I0712 02:53:13.728736  7420 server_base.cc:234] Dumped server 
> information to 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/config.json
> 02:53:15 I0712 02:53:13.769598  7518 tablet_bootstrap.cc:393] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: 
> Bootstrap starting.
> 02:53:15 I0712 02:53:13.771863  7518 tablet_bootstrap.cc:553] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: Time 
> spent opening tablet: real 0.001s   user 0.000s     sys 0.000s
> 02:53:15 I0712 02:53:13.772128  7518 tablet_bootstrap.cc:493] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No 
> blocks or log segments found. Creating new log.
> 02:53:15 I0712 02:53:13.772845  7518 log.cc:349] Log is configured to *not* 
> fsync() on all Append() calls
> 02:53:15 I0712 02:53:13.774338  7518 tablet_bootstrap.cc:393] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No 
> bootstrap required, opened a new log
> 02:53:15 I0712 02:53:13.776185  7518 raft_consensus.cc:269] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: Replica starting. Triggering 0 pending transactions. Active 
> config: opid_index: -1 peers { permanent_uuid: 
> "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER }
> 02:53:15 I0712 02:53:13.776408  7518 raft_consensus.cc:301] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: Consensus starting up: Expiring failure detector timer to make a 
> prompt election more likely
> 02:53:15 I0712 02:53:13.776487  7518 raft_consensus.cc:499] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: Becoming Follower/Learner. State: Replica: 
> 3ace416aa5ec45d692d21875e8c23322, State: 1, Role: FOLLOWER
> 02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0}
> 02:53:15 I0712 02:53:13.776564  7518 consensus_queue.cc:162] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, 
> Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, 
> Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER
> 02:53:15 I0712 02:53:13.776618  7518 raft_consensus.cc:314] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: Only one voter in the Raft config. Triggering election immediately
> 02:53:15 I0712 02:53:13.776641  7518 raft_consensus.cc:370] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: No leader contacted us within the election timeout. Triggering 
> leader election
> 02:53:15 I0712 02:53:13.776671  7518 raft_consensus.cc:2019] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 
> FOLLOWER]: Advancing to term 1
> 02:53:15 I0712 02:53:13.777194  7518 raft_consensus.cc:1970] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> FOLLOWER]: Snoozing failure detection for election timeout plus an additional 
> 1.636s
> 02:53:15 I0712 02:53:13.777375  7518 raft_consensus.cc:386] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> FOLLOWER]: Starting election with config: opid_index: -1 peers { 
> permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER }
> 02:53:15 I0712 02:53:13.778064  7518 leader_election.cc:248] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [CANDIDATE]: Term 1 election: Election decided. Result: candidate won.
> 02:53:15 I0712 02:53:13.778406  7527 raft_consensus.cc:1970] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> FOLLOWER]: Snoozing failure detection for election timeout plus an additional 
> 1.630s
> 02:53:15 I0712 02:53:13.778456  7527 raft_consensus.cc:1859] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> FOLLOWER]: Leader election won for term 1
> 02:53:15 I0712 02:53:13.790758  7527 raft_consensus.cc:464] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> LEADER]: Becoming Leader. State: Replica: 3ace416aa5ec45d692d21875e8c23322, 
> State: 1, Role: LEADER
> 02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0}
> 02:53:15 I0712 02:53:13.791013  7527 consensus_queue.cc:145] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [LEADER]: 
> Queue going to LEADER mode. State: All replicated op: 0.0, Majority 
> replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 
> 1, Majority size: 1, State: 1, Mode: LEADER, active raft config: opid_index: 
> -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: 
> VOTER }
> 02:53:15 I0712 02:53:13.794667  7527 sys_catalog.cc:235] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: SysCatalogTable state changed. Reason: New leader 
> 3ace416aa5ec45d692d21875e8c23322. Latest consensus state: current_term: 1 
> leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers 
> { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
> 02:53:15 I0712 02:53:13.794703  7527 sys_catalog.cc:238] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: This master's current role is: LEADER
> 02:53:15 I0712 02:53:13.794970  7518 sys_catalog.cc:235] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: SysCatalogTable state changed. Reason: Started TabletPeer. 
> Latest consensus state: current_term: 1 leader_uuid: 
> "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { 
> permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
> 02:53:15 I0712 02:53:13.795006  7518 sys_catalog.cc:238] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: This master's current role is: LEADER
> 02:53:15 I0712 02:53:13.795017  7529 sys_catalog.cc:235] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: SysCatalogTable state changed. Reason: RaftConsensus started. 
> Latest consensus state: current_term: 1 leader_uuid: 
> "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { 
> permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
> 02:53:15 I0712 02:53:13.795081  7529 sys_catalog.cc:238] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: This master's current role is: LEADER
> 02:53:15 I0712 02:53:13.803755  7531 raft_consensus_state.cc:534] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 
> LEADER]: Advanced the committed_index across terms. Last committed operation 
> was: term: 0 index: 0 New committed index is: term: 1 index: 1
> 02:53:15 I0712 02:53:13.804172  7518 sys_catalog.cc:311] T 
> 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 
> [sys.catalog]: configured and running, proceeding with master startup.
> 02:53:15 W0712 02:53:13.804466  7534 catalog_manager.cc:345] Catalog manager 
> background task thread going to sleep: Service unavailable: Catalog manager 
> is not initialized. State: 1
> 02:53:15 I0712 02:53:13.804584  7532 catalog_manager.cc:636] Loading table 
> and tablet metadata into memory...
> 02:53:15 I0712 02:53:13.804772  7420 master_main.cc:64] Master server 
> successfully started.
> 02:53:15 I0712 02:53:14.245645  7564 mem_tracker.cc:138] MemTracker: hard 
> memory limit is 94.627029 GB
> 02:53:15 I0712 02:53:14.245791  7564 mem_tracker.cc:140] MemTracker: soft 
> memory limit is 56.776218 GB
> 02:53:15 I0712 02:53:14.257587  7562 mem_tracker.cc:138] MemTracker: hard 
> memory limit is 94.627029 GB
> 02:53:15 I0712 02:53:14.257714  7562 mem_tracker.cc:140] MemTracker: soft 
> memory limit is 56.776218 GB
> 02:53:15 I0712 02:53:14.266713  7566 mem_tracker.cc:138] MemTracker: hard 
> memory limit is 94.627029 GB
> 02:53:15 I0712 02:53:14.266863  7566 mem_tracker.cc:140] MemTracker: soft 
> memory limit is 56.776218 GB
> 02:53:15 I0712 02:53:14.302768  7566 tablet_server_main.cc:54] Initializing 
> tablet server...
> 02:53:15 I0712 02:53:14.303900  7566 hybrid_clock.cc:178] HybridClock 
> initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 
> Current error: 969844
> 02:53:15 I0712 02:53:14.307047  7566 server_base.cc:164] Could not load 
> existing FS layout: Not found: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance:
>  No such file or directory (error 2)
> 02:53:15 I0712 02:53:14.307086  7566 server_base.cc:165] Creating new FS 
> layout
> 02:53:15 I0712 02:53:14.310693  7562 tablet_server_main.cc:54] Initializing 
> tablet server...
> 02:53:15 I0712 02:53:14.311682  7562 hybrid_clock.cc:178] HybridClock 
> initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 
> Current error: 969844
> 02:53:15 I0712 02:53:14.314699  7562 server_base.cc:164] Could not load 
> existing FS layout: Not found: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance:
>  No such file or directory (error 2)
> 02:53:15 I0712 02:53:14.314730  7562 server_base.cc:165] Creating new FS 
> layout
> 02:53:15 I0712 02:53:14.318703  7564 tablet_server_main.cc:54] Initializing 
> tablet server...
> 02:53:15 I0712 02:53:14.319725  7564 hybrid_clock.cc:178] HybridClock 
> initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 
> Current error: 969844
> 02:53:15 I0712 02:53:14.322813  7564 server_base.cc:164] Could not load 
> existing FS layout: Not found: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance:
>  No such file or directory (error 2)
> 02:53:15 I0712 02:53:14.322846  7564 server_base.cc:165] Creating new FS 
> layout
> 02:53:15 I0712 02:53:14.357198  7564 fs_manager.cc:345] Generated new 
> instance metadata in path 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance:
> 02:53:15 uuid: "76029326f81443679b55fcc12dcb8431"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.357671  7566 fs_manager.cc:345] Generated new 
> instance metadata in path 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance:
> 02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.378212  7562 fs_manager.cc:345] Generated new 
> instance metadata in path 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance:
> 02:53:15 uuid: "aae8d534a60648bfb24892221b73f659"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.499289  7566 fs_manager.cc:242] Opened local 
> filesystem: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data
> 02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.531463  7562 fs_manager.cc:242] Opened local 
> filesystem: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data
> 02:53:15 uuid: "aae8d534a60648bfb24892221b73f659"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.542300  7564 fs_manager.cc:242] Opened local 
> filesystem: 
> /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data
> 02:53:15 uuid: "76029326f81443679b55fcc12dcb8431"
> 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on 
> jenkins-slave-c3oe"
> 02:53:15 I0712 02:53:14.576563  7566 tablet_server_main.cc:57] Starting 
> tablet server...
> 02:53:15 I0712 02:53:14.606667  7564 tablet_server_main.cc:57] Starting 
> tablet server...
> 02:53:15 I0712 02:53:14.624748  7562 tablet_server_main.cc:57] Starting 
> tablet server...
> 02:53:15 I0712 02:53:15.243650  7492 catalog_manager.cc:762] CreateTable from 
> {real_user=jenkins-slave, eff_user=} at 127.0.0.1:47450:
> 02:53:15 name: "example-table"
> 02:53:15 schema {
> 02:53:15   columns {
> 02:53:15     name: "key"
> 02:53:15     type: INT32
> 02:53:15     is_key: true
> 02:53:15     is_nullable: false
> 02:53:15     encoding: AUTO_ENCODING
> 02:53:15     compression: DEFAULT_COMPRESSION
> 02:53:15     cfile_block_size: 0
> 02:53:15   }
> 02:53:15   columns {
> 02:53:15     name: "int_val"
> 02:53:15     type: INT32
> 02:53:15     is_key: false
> 02:53:15     is_nullable: true
> 02:53:15     encoding: AUTO_ENCODING
> 02:53:15     compression: DEFAULT_COMPRESSION
> 02:53:15     cfile_block_size: 0
> 02:53:15   }
> 02:53:15   columns {
> 02:53:15     name: "string_val"
> 02:53:15     type: STRING
> 02:53:15     is_key: false
> 02:53:15     is_nullable: true
> 02:53:15     encoding: AUTO_ENCODING
> 02:53:15     compression: DEFAULT_COMPRESSION
> 02:53:15     cfile_block_size: 0
> 02:53:15   }
> 02:53:15 }
> 02:53:15 split_rows_range_bounds {
> 02:53:15 }
> 02:53:15 partition_schema {
> 02:53:15   range_schema {
> 02:53:15     columns {
> 02:53:15       name: "key"
> 02:53:15     }
> 02:53:15   }
> 02:53:15 }
> {noformat}
> Here the CreateTable() call has arrived to the master before the tservers 
> have finished initializing, and thus before they have heartbeated to the 
> master.
> We should modify the start_cluster() call to wait until the tservers are 
> initialized. Possibly this means adding --server_dump_info_path=... to the 
> command line arguments and waiting for the file to show up. Or maybe that's 
> insufficient and we should also query the master for the list of tservers. 
> Either way, we need to wait a little longer in case the tservers are slow to 
> start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to