[
https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-2100:
------------------------------
Component/s: test
> Verify Java client's behavior for tserver and master fail-over scenario
> -----------------------------------------------------------------------
>
> Key: KUDU-2100
> URL: https://issues.apache.org/jira/browse/KUDU-2100
> Project: Kudu
> Issue Type: Test
> Components: test
> Reporter: Alexey Serbin
> Assignee: Edward Fancher
> Priority: Major
>
> This is to introduce a scenario where both the leader tserver and leader
> master 'unexpectedly crash' during the run. The idea is to verify that the
> client automatically updates its metacache even if the leader master changes
> and manages to send the data to the destination server eventually.
> Mike suggested the following test scenario:
> # Have a configuration with 3 master servers, 6 tablet servers, and a table
> consisting of 1 tablet with replication factor of 3. Let's assume the tablet
> are hosted by tablet servers TS1, TS2, and TS3.
> # Start the Kudu cluster.
> # Run the client to insert at least one row into the table.
> # Stop the client's activity, but keep the client object alive to keep it
> ready for the next steps.
> # 3 times: permanently kill the leader of the tablet, so the tablet
> eventually migrates to and is hosted by tablet servers TS4, TS5, TS6.
> # Kill the leader master (after the configuration change is committed).
> # Run the pre-warmed client to insert some data into the table again. Doing
> so, the client should refresh its metadata from the new leader master and be
> able to send the data to the right destination.
> # Count the number of rows in the table to make sure it matches the
> expectation.
> There was a discussion on when to kill the leader master: prior or after
> moving the table to the new set of tablet servers. It seems the latter case
> (the sequence suggested above) allows covering a situation when no master
> server recognizes itself as a leader. The client should retry in that case
> as well and eventually receive the tablet location info from the established
> leader master. If possible, let's implement the sequence for the former case
> as well as an additional test.
> The general idea is to make sure the Java client during fail-over events:
> * Retries write and read operations automatically on an error happened due to
> a fail-over event.
> * Does not silently lose any data: if the client cannot send the data due to
> timeout or running out of retry attempts, it should report on that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)