[
https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203846#comment-14203846
]
Lars Hofhansl commented on HBASE-12411:
---------------------------------------
Small scans do involve more work if they are in fact not small (i.e. involve
multiple RPC, as each RPC needs to resetup the scanner including all the
seeking needed).
The more I think about, the more I think that p-read only is the right choice
right now, as long as we have only one reader per HFile. When more than scanner
happens to read from an HFile the prefeteching is likely not going to help and
the right scanner would need to be the lucky one again and again. Only when we
can guarantee a single scanner will be scanning an HFile (a DFSInputStream to
be specific), and that scanner will be scanning enough to benefit from the
pre-fetching does seek + read make sense.
I'll do some perf testing and then post a patch.
> Avoid seek + read completely?
> -----------------------------
>
> Key: HBASE-12411
> URL: https://issues.apache.org/jira/browse/HBASE-12411
> Project: HBase
> Issue Type: Brainstorming
> Components: Performance
> Reporter: Lars Hofhansl
>
> In the light of HDFS-6735 we might want to consider refraining from seek +
> read completely and only perform preads.
> For example currently a compaction can lock out every other scanner over the
> file which the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can
> allow testing this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which
> parallelize queries intra region (and hence readers will used concurrently by
> multiple scanner with high likelihood.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)