[
https://issues.apache.org/jira/browse/HBASE-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ramkrishna.s.vasudevan updated HBASE-17849:
-------------------------------------------
Summary: PE tool random read is not totally random (was: PE tool
randomness is not totally random)
Updated title as this JIRA targets randomReads and randomSeekScan alone.
> PE tool random read is not totally random
> -----------------------------------------
>
> Key: HBASE-17849
> URL: https://issues.apache.org/jira/browse/HBASE-17849
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17849.patch, HBASE-17849.patch
>
>
> Recently we were using the PE tool for doing some bucket cache related
> performance tests. One thing that we noted was that the way the random read
> works is not totally random.
> Suppose we load 200G of data using --size param and then we use --rows=500000
> to do the randomRead. The assumption was among the 200G of data it could
> generate randomly 500000 row keys to do the reads.
> But it so happens that the PE tool generates random rows only on those set of
> row keys which falls under the first 500000 rows.
> This was quite evident when we tried to use HBASE-15314 in our testing.
> Suppose we split the bucket cache of size 200G into 2 files each 100G the
> randomReads with --rows=500000 always lands in the first file and not in the
> 2nd file. Better to make PE purely random.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)