[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252178#comment-13252178
 ] 

stack commented on HBASE-5754:
------------------------------

I managed to get things to work after a few detours and figuring the tool (I'm 
a little slow).  Here is output of a verify run after uploading 1B rows using 
the Generator tool (I have a five node cluster):

{code}
12/04/12 03:53:54 INFO mapred.JobClient:  map 100% reduce 99%
12/04/12 03:54:06 INFO mapred.JobClient:  map 100% reduce 100%
12/04/12 03:54:11 INFO mapred.JobClient: Job complete: job_201204092039_0040
12/04/12 03:54:11 INFO mapred.JobClient: Counters: 31
12/04/12 03:54:11 INFO mapred.JobClient:   Job Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Launched reduce tasks=103
12/04/12 03:54:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=43090396
12/04/12 03:54:11 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient:     Rack-local map tasks=75
12/04/12 03:54:11 INFO mapred.JobClient:     Launched map tasks=256
12/04/12 03:54:11 INFO mapred.JobClient:     Data-local map tasks=181
12/04/12 03:54:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8911236
12/04/12 03:54:11 INFO mapred.JobClient:   goraci.Verify$Counts
12/04/12 03:54:11 INFO mapred.JobClient:     REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient:     UNREFERENCED=1040000000
12/04/12 03:54:11 INFO mapred.JobClient:   File Output Format Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Bytes Written=0
12/04/12 03:54:11 INFO mapred.JobClient:   FileSystemCounters
12/04/12 03:54:11 INFO mapred.JobClient:     FILE_BYTES_READ=80913119406
12/04/12 03:54:11 INFO mapred.JobClient:     HDFS_BYTES_READ=156449
12/04/12 03:54:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=107202716633
12/04/12 03:54:11 INFO mapred.JobClient:   File Input Format Counters
12/04/12 03:54:11 INFO mapred.JobClient:     Bytes Read=0
12/04/12 03:54:11 INFO mapred.JobClient:   Map-Reduce Framework
12/04/12 03:54:11 INFO mapred.JobClient:     Map output materialized 
bytes=28369514665
12/04/12 03:54:11 INFO mapred.JobClient:     Map input records=1604459547
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce shuffle bytes=28259732158
12/04/12 03:54:11 INFO mapred.JobClient:     Spilled Records=8195463443
12/04/12 03:54:11 INFO mapred.JobClient:     Map output bytes=24031522877
12/04/12 03:54:11 INFO mapred.JobClient:     CPU time spent (ms)=20730410
12/04/12 03:54:11 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=150411739136
12/04/12 03:54:11 INFO mapred.JobClient:     Combine input records=0
12/04/12 03:54:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=156449
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce input records=2168919094
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce input groups=1604459547
12/04/12 03:54:11 INFO mapred.JobClient:     Combine output records=0
12/04/12 03:54:11 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=144318976000
12/04/12 03:54:11 INFO mapred.JobClient:     Reduce output records=0
12/04/12 03:54:11 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=522892115968
12/04/12 03:54:11 INFO mapred.JobClient:     Map output records=2168919094
{code}

Going by the README, it would seem we're basically working.  Should I close 
this issue or would you like me to look at something else?  Thanks.

I like this goraci tool.  Will play some more with it.
                
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
>                 Key: HBASE-5754
>                 URL: https://issues.apache.org/jira/browse/HBASE-5754
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>         Environment: 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
> has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
> verification failed because about 21K entries were missing.  The goraci 
> [README|https://github.com/keith-turner/goraci] explains the test, and how it 
> detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.  
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the 
> problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to 
> build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to