[
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488463#comment-13488463
]
Karthik Ranganathan commented on HBASE-6874:
--------------------------------------------
Thought about the N scanners, its a complicated change - you would have to
change the entire scan protocol. Each of the next calls in scans are not
numbered, and so you could go out of whack if prefetching N (and throw in
exceptions). There is also the basic issue right now that scans do retries
which is wrong. Also, reasoning about it another way, if your in memory scan
throughput is > the time to read from disk, you're probably good. I found that
there are other unrelated bottlenecks preventing this from being the case. Of
course, if the filtering is very heavy then this will breakdown... you probably
want to implement prefetching based on the num filtered rows, which should not
be too hard.
I have a patch I have tested with, but its waiting on HBASE-6770 - that is
going to refactor scans quite a bit. Will put a patch out once that is done.
> Implement prefetching for scanners
> ----------------------------------
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
> Issue Type: Sub-task
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in
> memory and found that adding pre-fetching increases the throughput by about
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result
> ready. When the scanner's next comes in, return the pre-computed result and
> issue another background read.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira