[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252491#comment-13252491
 ] 

Keith Turner commented on HBASE-5754:
-------------------------------------

The counts for the 1B run seem odd to me , but maybe thats just an artifact of 
how many map task you ran for the generator and how much data each task 
generated.  If a map task does not does not generate a multiple of 25,000,000 
then it will leave some unreferenced.  It generates a circular linked list 
every 25M.   

{noformat}
12/04/12 03:54:11 INFO mapred.JobClient:     REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient:     UNREFERENCED=1040000000
{noformat}

If you were to run 10 map task each generating 100M, then this should generate 
1B with all nodes referenced.  Minimizing the number of unreferenced is ideal, 
because the test can not detect the loss of unreferenced nodes.  I should 
probably add this info to the readme.

                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
> has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
> verification failed because about 21K entries were missing.  The goraci 
> [README|https://github.com/keith-turner/goraci] explains the test, and how it 
> detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the 
> problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to 
> build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to