[
https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217447#comment-14217447
]
Lars Hofhansl edited comment on HBASE-12411 at 11/19/14 6:13 AM:
-----------------------------------------------------------------
Update: I messed up the test... Numbers updated now.
Table fully compacted, 3 regions, i.e. 3 HFiles. 1.4G in 32m rows, FAST_DIFF
encoded. Not using the blockcache.
Some numbers:
||scanners per HFile||pread forced||OS cache||time/s||
|1|yes|yes|30|
|1|yes|no|33|
|1|no|yes|20|
|1|no|no|22|
|5|yes|yes|10.0|
|5|yes|no|11.0|
|5|no|yes|13|
|5|no|no|13.3|
|30|yes|yes|9.5|
|30|yes|no|12.4|
|30|no|yes|12.9|
|30|no|no|13.1|
In the 1 scanner per region case I get much better performance *without*
p-reads (i.e. seek+read, presumably due to pre-fetching), and I see no visible
load on the data node process. In all other cases, due to the involved p-reads
I see considerable load on the DN.
p-read only actually is of help when using many scanners against the same HFile.
HBase/HDFS does not seem to make good use of the OS cache.
I will test with HDFS-6735.
was (Author: lhofhansl):
Update: I messed up the test... Numbers updated now.
Table fully compacted, 3 regions, i.e. 3 HFiles. 1.4G in 32m rows, FAST_DIFF
encoded. Not using the blockcache.
Some numbers:
||scanners per HFile||pread forced||OS cache||time/s||
|1|yes|yes|30|
|1|yes|no|33|
|1|no|yes|20|
|1|no|no|22|
|5|yes|yes|10.0|
|5|yes|no|11.0|
|5|no|yes|13|
|5|no|no|13.3|
|30|yes|yes|9.5|
|30|yes|no|12.4|
|30|no|yes|12.9|
|30|no|no|13.1|
In the 1 scanner per region case I get much better performance with p-reads,
and I see no visible load on the data node process. In all other cases, due to
the involved p-reads I see considerable load on the DN.
So p-read actually is of help when using many scanners against.
HBase/HDFS does not seem to make good use of the OS cache.
I will test with HDFS-6735.
> Optionally enable p-reads and private readers for compactions
> -------------------------------------------------------------
>
> Key: HBASE-12411
> URL: https://issues.apache.org/jira/browse/HBASE-12411
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: 12411-v2.txt, 12411-v3.txt, 12411-v4.txt, 12411.txt
>
>
> In the light of HDFS-6735 we might want to consider refraining from seek +
> read completely and only perform preads.
> For example currently a compaction can lock out every other scanner over the
> file which the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can
> allow testing this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which
> parallelize queries intra region (and hence readers will used concurrently by
> multiple scanner with high likelihood.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)