[
https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214136#comment-14214136
]
Lars Hofhansl commented on HBASE-12411:
---------------------------------------
bq. Is that all it takes?
Yeah, I was a bit surprised myself. I had all kinds of approaches before that
managed a bunch of independent readers and keep track over whether/when they
needed to be closed, etc, etc. All brittle and complicated stuff that I did not
like.
Then I stepped back and realized that we only need to work on private
StoreFiles.
This works because we're not resetting the scanner stack for compaction
scanners when other *other* compactions finish, so compactions already run
independently, but share the readers of the HFiles the read as input.
I'll put a config switch, default off, and we can put this into trunk only
maybe...?
Getting conclusive numbers will be a bit tricky. We need to run compactions on
real clusters so we see the network bandwidth consumption and then measure
scans/gets that go on in parallel. The larger and fewer the HFiles the more
pronounced the win will be.
If/when HDFS-6735 is fixed we may no longer need this.
I also need to test with all preads on real disks - most of my development
machines now have SSDs.
If all reads were pread compaction would no longer lock out concurrent scanners
in the same HFiles.
> Avoid seek + read completely?
> -----------------------------
>
> Key: HBASE-12411
> URL: https://issues.apache.org/jira/browse/HBASE-12411
> Project: HBase
> Issue Type: Brainstorming
> Components: Performance
> Reporter: Lars Hofhansl
> Attachments: 12411.txt
>
>
> In the light of HDFS-6735 we might want to consider refraining from seek +
> read completely and only perform preads.
> For example currently a compaction can lock out every other scanner over the
> file which the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can
> allow testing this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which
> parallelize queries intra region (and hence readers will used concurrently by
> multiple scanner with high likelihood.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)