[ 
https://issues.apache.org/jira/browse/HBASE-17917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976277#comment-15976277
 ] 

Duo Zhang commented on HBASE-17917:
-----------------------------------

OK, the problem is the new code in PE...

I put 10M rows(10G in size) into a single region, flush, then compact it into 
one file. The test command is

{noformat}
./bin/hbase pe --rows=10000000 --cacheBlocks=false --caching=30 
--scanReadType=pread/stream --nomapred scan 1
./bin/hbase pe --rows=1000000 --cacheBlocks=false --caching=30 
--scanReadType=pread/stream --nomapred scan 10
{noformat}

The result is like what [~stack] said.

For one thread test, stream is about 180s, and pread is about 210s.
For 10 threads test, stream is about 68s, and pread is abount 28s.

Whether to set readahead to 0 does not have much impact on the results. But a 
strange thing is that pread + asyncPrefetch is much slower than pread, about 
360s.

So here, I want to revive an old idea, use pread by default, and switch to 
stream(by openning a new reader) if we read from the scanner multiple times. 
Now after HBASE-17914 we already have the ability to open multiple readers on 
the same StoreFile, I think it is much easier to implement this logic.

And also, we can also do some refactoring to reduce the work when openning a 
HFileReader.

> Use pread by default for all user scan
> --------------------------------------
>
>                 Key: HBASE-17917
>                 URL: https://issues.apache.org/jira/browse/HBASE-17917
>             Project: HBase
>          Issue Type: Sub-task
>          Components: scan
>            Reporter: Duo Zhang
>
> As said in the parent issue. We need some benchmark here first.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to