[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271036#comment-13271036
 ] 

Enis Soztutar commented on HBASE-5754:
--------------------------------------

In one of my 0.92.x tests on a 10 node cluster, 250M inserts, I did manage to 
get the verify to fail: 
{code}
12/05/08 11:11:18 INFO mapred.JobClient:   goraci.Verify$Counts
12/05/08 11:11:18 INFO mapred.JobClient:     UNDEFINED=972506
12/05/08 11:11:18 INFO mapred.JobClient:     REFERENCED=248051318
12/05/08 11:11:18 INFO mapred.JobClient:     UNREFERENCED=972506
12/05/08 11:11:18 INFO mapred.JobClient:   Map-Reduce Framework
12/05/08 11:11:18 INFO mapred.JobClient:     Map input records=249023824
{code}

Notice that map input records is 1M less that 250M, which indicates that the 
inputformat did not provide all records in the table. The missing rows all 
belong to the single region. I rerun the test again after a couple of hours, 
and it passed. But the failed test created 244 maps, instead of 246, which is 
the current region count, so I am suspecting there is something wrong in the 
split calculation or in the supposed transactional behavior for split/balance 
operations in the meta table. I am still inspecting the code and the logs, but 
any pointers are welcome. 
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
> has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
> verification failed because about 21K entries were missing.  The goraci 
> [README|https://github.com/keith-turner/goraci] explains the test, and how it 
> detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the 
> problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to 
> build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to