[
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-10676:
--------------------------
Resolution: Duplicate
Status: Resolved (was: Patch Available)
Resolving as duplicate/subsumed by HBASE-17072 which purges the ThreadLocal.
Thank you for your hard work in here [~zhaojianbo]... Sorry it took us a while
to get around to this.
> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher
> perforamce of scan
> ------------------------------------------------------------------------------------------------
>
> Key: HBASE-10676
> URL: https://issues.apache.org/jira/browse/HBASE-10676
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.99.0
> Reporter: zhaojianbo
> Assignee: zhaojianbo
> Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch,
> HBASE-10676-0.98-branchV2.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this
> block's header as part of the previous read's look-ahead. And we also want to
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is
> threadlocal for one storefile reader, and in the RegionScanner
> lifecycle,different rpc handlers will serve scan requests of the same
> scanner. Even though one handler of previous scan call prefetched the next
> block header, the other handlers of current scan call will still trigger a
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2
> doesn't know the header of block2 already prefetched by rs handler1, triggers
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless,
> and should be abandoned. I did the work, and evaluated the performance of one
> client, two client and four client scanning the same region with one
> storefile. The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data,
> 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
> With the improvement, the client total scan time decreases 21% for the one
> client case, 11% for the two clients case. But the four clients case is
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
> With some modification(see the comments below), the newest result is
> ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
> |original|1|306222|new with synchronized|1|239510|new with
> AtomicReference|1|241243|
> |original|2|416390|new with synchronized|2|365367|new with
> AtomicReference|2|368952|
> |original|4|555986|new with synchronized|4|540642|new with
> AtomicReference|4|545715|
> |original|8|854029|new with synchronized|8|852137|new with
> AtomicReference|8|850401|
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)