[ 
https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214136#comment-14214136
 ] 

Lars Hofhansl commented on HBASE-12411:
---------------------------------------

bq. Is that all it takes?

Yeah, I was a bit surprised myself. I had all kinds of approaches before that 
managed a bunch of independent readers and keep track over whether/when they 
needed to be closed, etc, etc. All brittle and complicated stuff that I did not 
like.
Then I stepped back and realized that we only need to work on private 
StoreFiles.

This works because we're not resetting the scanner stack for compaction 
scanners when other *other* compactions finish, so compactions already run 
independently, but share the readers of the HFiles the read as input.

I'll put a config switch, default off, and we can put this into trunk only 
maybe...?

Getting conclusive numbers will be a bit tricky. We need to run compactions on 
real clusters so we see the network bandwidth consumption and then measure 
scans/gets that go on in parallel. The larger and fewer the HFiles the more 
pronounced the win will be.

If/when HDFS-6735 is fixed we may no longer need this.

I also need to test with all preads on real disks - most of my development 
machines now have SSDs.
If all reads were pread compaction would no longer lock out concurrent scanners 
in the same HFiles.


> Avoid seek + read completely?
> -----------------------------
>
>                 Key: HBASE-12411
>                 URL: https://issues.apache.org/jira/browse/HBASE-12411
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Performance
>            Reporter: Lars Hofhansl
>         Attachments: 12411.txt
>
>
> In the light of HDFS-6735 we might want to consider refraining from seek + 
> read completely and only perform preads.
> For example currently a compaction can lock out every other scanner over the 
> file which the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can 
> allow testing this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which 
> parallelize queries intra region (and hence readers will used concurrently by 
> multiple scanner with high likelihood.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to