[ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2033:
--------------------------------
    Description: 
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * Add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over.  Running the scans would not only verify that the data eventually 
reaches the destination, but verify that the client automatically retries the 
scan operations and eventually succeeds reading the data from the cluster.
  * Induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer.  
This is to spot possible bugs during the transition processes and occurrence of 
multiple fail-over events.
  * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
cover the retry code paths for both cases (as of now, I could see only the 
LEADER_ONLY path covered, but I might be mistaken).
  * Extra: add the multi-master scenario, where both the leader tserver and 
leader master 'unexpectedly crash' during the run.  The idea is to verify that 
the client automatically updates its metacache even if the leader master 
changes and manages to send the data to the destination server eventually.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.
   

  was:
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over
  * induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer
  * add the multi-master scenario, where both the leader tserver and leader 
master 'unexpectedly crash' during the run
  * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
sure the RYW behavior is observed as expected
   


> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-2033
>                 URL: https://issues.apache.org/jira/browse/KUDU-2033
>             Project: Kudu
>          Issue Type: Test
>          Components: client, java
>            Reporter: Alexey Serbin
>            Assignee: Edward Fancher
>              Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * Add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over.  Running the scans would not only verify that the data eventually 
> reaches the destination, but verify that the client automatically retries the 
> scan operations and eventually succeeds reading the data from the cluster.
>   * Induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer.  
> This is to spot possible bugs during the transition processes and occurrence 
> of multiple fail-over events.
>   * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
> mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
> cover the retry code paths for both cases (as of now, I could see only the 
> LEADER_ONLY path covered, but I might be mistaken).
>   * Extra: add the multi-master scenario, where both the leader tserver and 
> leader master 'unexpectedly crash' during the run.  The idea is to verify 
> that the client automatically updates its metacache even if the leader master 
> changes and manages to send the data to the destination server eventually.
> The general idea is to make sure the Java client during fail-over events:
> * Retries write and read operations automatically on an error happened due to 
> a fail-over event.
> * Does not silently lose any data: if the client cannot send the data due to 
> timeout or running out of retry attempts, it should report on that.
>    



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to