[ 
https://issues.apache.org/jira/browse/HBASE-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746084#comment-13746084
 ] 

gautam commented on HBASE-9108:
-------------------------------

"The retry logic inside HBase already does what you mention (storing failed 
keys and retrying)."
But sometimes, the retry logic fails with 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException, and hence 
fails the test as the key becomes the failed key, and hence you need to tune 
your env, which sometimes is a small and the only cluster setup. Or as you said 
you need to fine tune CM, which then you would need to vary for different 
cluster setups to get a better MTTR.
Some other time you observe the key has failed to write, because of:
java.io.EOFException
org.apache.hadoop.hbase.NotServingRegionException,
org.apache.hadoop.hbase.client.NoServerForRegionException,
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException,
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException

A tester might want to skip the retrial attempts here, skip the key here and 
proceed, and he can configure the exceptions he want to skip on write by 
passing it over as configuration. Since this wont be available by default in 
hbase configuration xmls, this is a known risk he will take.
And sorry I didnt mean to say that, I agree we already have a "stronger & 
better hbase version". My intent was for future version upgrades, tester might 
want to go for 100% read+write, as he might have moved to a better & a big 
cluster setup with a better MTTR.



                
> LoadTestTool need to have a way to ignore keys which were failed during 
> write. 
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-9108
>                 URL: https://issues.apache.org/jira/browse/HBASE-9108
>             Project: HBase
>          Issue Type: Improvement
>          Components: test
>    Affects Versions: 0.95.0, 0.95.1, 0.94.9, 0.94.10
>            Reporter: gautam
>            Assignee: gautam
>            Priority: Critical
>         Attachments: 9108.patch._trunk.5, 9108.patch._trunk.6, 
> HBASE-9108.patch._trunk.2, HBASE-9108.patch._trunk.3, 
> HBASE-9108.patch._trunk.4, HBASE-9108.patch._trunk.7, 
> HBASE-9108.patch._trunk.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> While running the chaosmonkey integration tests, it is found that write 
> sometimes fails when the cluster components are restarted/stopped/killed etc..
> The data key which was being put, using the LoadTestTool, is added to the 
> failed key set, and at the end of the test, this failed key set is checked 
> for any entries to assert failures.
> While doing fail-over testing, it is expected that some of the keys may go 
> un-written. The point here is to validate that whatever gets into hbase for 
> an unstable cluster really goes in, and hence read should be 100% for 
> whatever keys went in successfully.
> Currently LoadTestTool has strict checks to validate every key being written 
> or not. In case any keys is not written, it fails.
> I wanted to loosen this constraint by allowing users to pass in a set of 
> exceptions they expect when doing put/write operations over hbase. If one of 
> these expected exception set is thrown while writing key to hbase, the failed 
> key would be ignored, and hence wont even be considered again for subsequent 
> write as well as read.
> This can be passed to the load test tool as csv list parameter 
> -allowed_write_exceptions, or it can be passed through hbase-site.xml by 
> writing a value for "test.ignore.exceptions.during.write"
> Here is the usage:
> -allowed_write_exceptions 
> "java.io.EOFException,org.apache.hadoop.hbase.NotServingRegionException,org.apache.hadoop.hbase.client.NoServerForRegionException,org.apache.hadoop.hbase.ipc.ServerNotRunningYetException"
> Hence, by doing this the existing integration tests can also make use of this 
> change by passing it as property in hbase-site.xml, as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to