[
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920853#comment-13920853
]
Jean-Marc Spaggiari commented on HBASE-10676:
---------------------------------------------
Nice, any impact on the other operations? Like get?
I can run it on PE for a day if you want.
> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher
> perforamce of scan
> ------------------------------------------------------------------------------------------------
>
> Key: HBASE-10676
> URL: https://issues.apache.org/jira/browse/HBASE-10676
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.98.0
> Reporter: zhaojianbo
> Attachments: HBASE-10676-0.98-branch.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this
> block's header as part of the previous read's look-ahead. And we also want to
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is
> threadlocal for one storefile reader, and in the RegionScanner
> lifecycle,different rpc handlers will serve scan requests of the same
> scanner. Even though one handler of previous scan call prefetched the next
> block header, the other handlers of current scan call will still trigger a
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2
> doesn't know the header of block2 already prefetched by rs handler1, triggers
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless,
> and should be abandoned. I did the work, and evaluated the performance of one
> client, two client and four client scanning the same region with one
> storefile. The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data,
> 18995949 kvs. Caching is set 1000.
> With the improvement, the client total scan time decreases 21% for the one
> client case, 11% for the two clients case. But the four clients case is
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
--
This message was sent by Atlassian JIRA
(v6.2#6252)