[ 
https://issues.apache.org/jira/browse/HBASE-12782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274693#comment-14274693
 ] 

stack commented on HBASE-12782:
-------------------------------

bq. yeah, debugging ITBLL has proven to be very hard. What I had done 
previously was to keep all the files and WAL's and do custom search on top of 
that.

Let me try and make some tools.  The failure only seems to come at scale which 
is pain debugging.

On my weekend messings, I was hoping my pointed replication of the set of 
failures during a 'suspicious' section of client retries would narrow the debug 
surface especially if I was able to do it in a unit test.  What I found was 
that a high fidelity reproduction of the exceptions thrown and with retries in 
extremis, in a unit test environment, it was still insufficient for dataloss. 
Taking my unit test and redoing as an IT test to get real cluster timings in 
the mix, again, no cigar, not unless the numbers large (100M+) -- but then I 
was back into the big original ITBLL space trying to trace the ghost of missing 
rows.

Let me do the WAL search tool.

> ITBLL fails for me if generator does anything but 5M per maptask
> ----------------------------------------------------------------
>
>                 Key: HBASE-12782
>                 URL: https://issues.apache.org/jira/browse/HBASE-12782
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: 12782.unit.test.and.it.test.txt, 
> 12782.unit.test.writing.txt
>
>
> Anyone else seeing this?  If I do an ITBLL with generator doing 5M rows per 
> maptask, all is good -- verify passes. I've been running 5 servers and had 
> one splot per server.  So below works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase 
> classpath`" ./hadoop/bin/hadoop --config ~/conf_hadoop 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey 
> serverKilling Generator 5 5000000 g1.tmp
> or if I double the map tasks, it works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase 
> classpath`" ./hadoop/bin/hadoop --config ~/conf_hadoop 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey 
> serverKilling Generator 10 5000000 g2.tmp
> ...but if I change the 5M to 50M or 25M, Verify fails.
> Looking into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to