[
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252178#comment-13252178
]
stack commented on HBASE-5754:
------------------------------
I managed to get things to work after a few detours and figuring the tool (I'm
a little slow). Here is output of a verify run after uploading 1B rows using
the Generator tool (I have a five node cluster):
{code}
12/04/12 03:53:54 INFO mapred.JobClient: map 100% reduce 99%
12/04/12 03:54:06 INFO mapred.JobClient: map 100% reduce 100%
12/04/12 03:54:11 INFO mapred.JobClient: Job complete: job_201204092039_0040
12/04/12 03:54:11 INFO mapred.JobClient: Counters: 31
12/04/12 03:54:11 INFO mapred.JobClient: Job Counters
12/04/12 03:54:11 INFO mapred.JobClient: Launched reduce tasks=103
12/04/12 03:54:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=43090396
12/04/12 03:54:11 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/04/12 03:54:11 INFO mapred.JobClient: Rack-local map tasks=75
12/04/12 03:54:11 INFO mapred.JobClient: Launched map tasks=256
12/04/12 03:54:11 INFO mapred.JobClient: Data-local map tasks=181
12/04/12 03:54:11 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8911236
12/04/12 03:54:11 INFO mapred.JobClient: goraci.Verify$Counts
12/04/12 03:54:11 INFO mapred.JobClient: REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient: UNREFERENCED=1040000000
12/04/12 03:54:11 INFO mapred.JobClient: File Output Format Counters
12/04/12 03:54:11 INFO mapred.JobClient: Bytes Written=0
12/04/12 03:54:11 INFO mapred.JobClient: FileSystemCounters
12/04/12 03:54:11 INFO mapred.JobClient: FILE_BYTES_READ=80913119406
12/04/12 03:54:11 INFO mapred.JobClient: HDFS_BYTES_READ=156449
12/04/12 03:54:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=107202716633
12/04/12 03:54:11 INFO mapred.JobClient: File Input Format Counters
12/04/12 03:54:11 INFO mapred.JobClient: Bytes Read=0
12/04/12 03:54:11 INFO mapred.JobClient: Map-Reduce Framework
12/04/12 03:54:11 INFO mapred.JobClient: Map output materialized
bytes=28369514665
12/04/12 03:54:11 INFO mapred.JobClient: Map input records=1604459547
12/04/12 03:54:11 INFO mapred.JobClient: Reduce shuffle bytes=28259732158
12/04/12 03:54:11 INFO mapred.JobClient: Spilled Records=8195463443
12/04/12 03:54:11 INFO mapred.JobClient: Map output bytes=24031522877
12/04/12 03:54:11 INFO mapred.JobClient: CPU time spent (ms)=20730410
12/04/12 03:54:11 INFO mapred.JobClient: Total committed heap usage
(bytes)=150411739136
12/04/12 03:54:11 INFO mapred.JobClient: Combine input records=0
12/04/12 03:54:11 INFO mapred.JobClient: SPLIT_RAW_BYTES=156449
12/04/12 03:54:11 INFO mapred.JobClient: Reduce input records=2168919094
12/04/12 03:54:11 INFO mapred.JobClient: Reduce input groups=1604459547
12/04/12 03:54:11 INFO mapred.JobClient: Combine output records=0
12/04/12 03:54:11 INFO mapred.JobClient: Physical memory (bytes)
snapshot=144318976000
12/04/12 03:54:11 INFO mapred.JobClient: Reduce output records=0
12/04/12 03:54:11 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=522892115968
12/04/12 03:54:11 INFO mapred.JobClient: Map output records=2168919094
{code}
Going by the README, it would seem we're basically working. Should I close
this issue or would you like me to look at something else? Thanks.
I like this goraci tool. Will play some more with it.
> data lost with gora continuous ingest test (goraci)
> ---------------------------------------------------
>
> Key: HBASE-5754
> URL: https://issues.apache.org/jira/browse/HBASE-5754
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1
> Environment: 10 node test cluster
> Reporter: Eric Newton
> Assignee: stack
>
> Keith Turner re-wrote the accumulo continuous ingest test using gora, which
> has both hbase and accumulo back-ends.
> I put a billion entries into HBase, and ran the Verify map/reduce job. The
> verification failed because about 21K entries were missing. The goraci
> [README|https://github.com/keith-turner/goraci] explains the test, and how it
> detects missing data.
> I re-ran the test with 100 million entries, and it verified successfully.
> Both of the times I tested using a billion entries, the verification failed.
> If I run the verification step twice, the results are consistent, so the
> problem is
> probably not on the verify step.
> Here's the versions of the various packages:
> ||package||version||
> |hadoop|0.20.205.0|
> |hbase|0.92.1|
> |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
> |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08|
> The change I made to goraci was to configure it for hbase and to allow it to
> build properly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira