[
https://issues.apache.org/jira/browse/HBASE-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968823#comment-15968823
]
Duo Zhang commented on HBASE-17910:
-----------------------------------
I found a problem that when opening an HFileReader, we will read a lot of data
such as trailer, index, and so on. This may have bad impact on performance so
for now I think it is only safe to be used in compaction. So I opened
HBASE-17914 to land the current code first.
And also, I will do some tests on whether we can use pread for any user scan.
If it turns out that pread is not slow than streaming read in most cases, then
we could use pread for all user scan by default, unless user set the ReadType
manually to STREAM. And if so, I think it is OK to open new readers as it is
request by user directly and user knows the possible downside.
Of course, these stuffs(trailer, index, etc.) can be shared between different
readers. Will open other issues to address it.
Thanks.
> Use separated StoreFileReader for streaming read
> ------------------------------------------------
>
> Key: HBASE-17910
> URL: https://issues.apache.org/jira/browse/HBASE-17910
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> For now we have already supportted using private readers for compaction, by
> creating a new StoreFile copy. I think a better way is to allow creating
> multiple readers from a single StoreFile instance, thus we can avoid the ugly
> cloning, and the reader can also be used for streaming scan, not only for
> compaction.
> The reason we want to do this is that, we found a read amplification when
> using short circult read. {{BlockReaderLocal}} will use an internal buffer to
> read data first, the buffer size is based on the configured buffer size and
> the readahead option in CachingStrategy. For normal pread request, we should
> just bypass the buffer, this can be achieved by setting readahead to 0. But
> for streaming read I think the buffer is somehow still useful? So we need to
> use different FSDataInputStream for pread and streaming read.
> And one more thing is that, we can also remove the streamLock if streaming
> read always use its own reader.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)