[ 
https://issues.apache.org/jira/browse/HBASE-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746078#comment-13746078
 ] 

Enis Soztutar commented on HBASE-9108:
--------------------------------------

There are two concerns about this approach as I see it. First, naming a set of 
exceptions is very brittle. With new exceptions, or refactored ones, keeping 
that in sync will become a burden on the test maintainer. Second, 100% write 
guarantee is what we want from this test. The retry logic inside HBase already 
does what you mention (storing failed keys and retrying).

bq. when you want to run the same set of tests to get 100% write guarantee as 
well, over say a stronger & better hbase version, you just need to remove the 
configuration
I think that we already have a "stronger & better hbase version". We have seen 
CM actions which cause several minutes of downtimes on our test setup, thus 
causing LoadTestTool to fail, but I think the right way to approach this is to 
configure the test env for better MTTR, and limit the chaos caused by CM, 
together with adjusting the retry / timeouts accordingly. 
                
> LoadTestTool need to have a way to ignore keys which were failed during 
> write. 
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-9108
>                 URL: https://issues.apache.org/jira/browse/HBASE-9108
>             Project: HBase
>          Issue Type: Improvement
>          Components: test
>    Affects Versions: 0.95.0, 0.95.1, 0.94.9, 0.94.10
>            Reporter: gautam
>            Assignee: gautam
>            Priority: Critical
>         Attachments: 9108.patch._trunk.5, 9108.patch._trunk.6, 
> HBASE-9108.patch._trunk.2, HBASE-9108.patch._trunk.3, 
> HBASE-9108.patch._trunk.4, HBASE-9108.patch._trunk.7, 
> HBASE-9108.patch._trunk.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> While running the chaosmonkey integration tests, it is found that write 
> sometimes fails when the cluster components are restarted/stopped/killed etc..
> The data key which was being put, using the LoadTestTool, is added to the 
> failed key set, and at the end of the test, this failed key set is checked 
> for any entries to assert failures.
> While doing fail-over testing, it is expected that some of the keys may go 
> un-written. The point here is to validate that whatever gets into hbase for 
> an unstable cluster really goes in, and hence read should be 100% for 
> whatever keys went in successfully.
> Currently LoadTestTool has strict checks to validate every key being written 
> or not. In case any keys is not written, it fails.
> I wanted to loosen this constraint by allowing users to pass in a set of 
> exceptions they expect when doing put/write operations over hbase. If one of 
> these expected exception set is thrown while writing key to hbase, the failed 
> key would be ignored, and hence wont even be considered again for subsequent 
> write as well as read.
> This can be passed to the load test tool as csv list parameter 
> -allowed_write_exceptions, or it can be passed through hbase-site.xml by 
> writing a value for "test.ignore.exceptions.during.write"
> Here is the usage:
> -allowed_write_exceptions 
> "java.io.EOFException,org.apache.hadoop.hbase.NotServingRegionException,org.apache.hadoop.hbase.client.NoServerForRegionException,org.apache.hadoop.hbase.ipc.ServerNotRunningYetException"
> Hence, by doing this the existing integration tests can also make use of this 
> change by passing it as property in hbase-site.xml, as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to