[
https://issues.apache.org/jira/browse/HBASE-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794859#comment-13794859
]
stack commented on HBASE-9759:
------------------------------
+1 on trying the patch. How does it provent collision (I did not review
closely).
If you do a select on row 0, does it have more versions than other rows.
What is to prevent our clashing randomly on another row? Because our randoms
generation is within a fixed range per iteration?
> IntegrationTestBulkLoad random number collision
> -----------------------------------------------
>
> Key: HBASE-9759
> URL: https://issues.apache.org/jira/browse/HBASE-9759
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9759_v1.patch
>
>
> ITBL failed recently in our test harness. Inspecting the failure made me
> believe that the only reason that particular failure might have happened is
> that there is a collision in random longs generated by the test.
> The test creates 50 mappers by default, and each mapper writes a 500K random
> rows starting with row = 0. By default there are 5 iterations.
> The check job outputs these counters:
> {code}
> 2013-10-13 07:48:01,134 Map input records=124999751
> 2013-10-13 07:48:01,134 Map output records=124999999
> {code}
> The number of input records seems fine because
> {code}
> 124999751 = 1 + 5 * (0.5M - 1) * 50
> {code}
> 5 = num iterations, 0.5M = num rows, 50 = num mappers, and 1 is for row =0
> which every chain writes to.
> Output records should be 125M, however we see one cell missing. Since the map
> input records matches expected number of distinct rows, I suspect that row =
> 0 had a collision.
> In one of the generate jobs, we can see that the reducer output count does
> not match the reducer input count. Given that we are using KVSortReducer,
> this confirms that there is a collision in KeyValues received by this task.
> {code}
> 2013-10-13 06:48:12,738 Reduce input records=75000000
> 2013-10-13 06:48:12,738 Reduce output records=74999997
> {code}
> The count is off by 3 because we are writing 3 columns per row.
> My only theory for explaining this is that we had a collision in chainId's or
> one of the chains reused row = 0 as the next row.
> This is similar to HBASE-8700, however, in this the probability is much much
> much lower.
--
This message was sent by Atlassian JIRA
(v6.1#6144)