[
https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557153#comment-14557153
]
Lars Hofhansl commented on HBASE-13448:
---------------------------------------
My test is quite specific in that the entire scan happens on the region server,
because all Cells are filtered there. I do this in order to find out how much
overhead the server has. It's possible that if the Cells would not be filtered
and more calls to getRowLength would happen.
I have not specifically tracked GC activity. I ran the test many times in a
loop, first warming up the region server a few times, then running it a few
time in order to capture some GC activity in the run times.
My main comment stands: Just because we call getRowLength a bunch, or a
profiler says it's inefficient, doesn't mean it's bad. Only a real test can
bear that out. For this case it's best (I think) to test with just a single
region server to keep network variance out of the picture (and this is a region
server local optimization anyway).
I don't know how to explain the numbers, yet. It is possible that reading the
length from a member leads to less efficient cache line utilization compared to
decoding it from the byte[] each time... That would heavily depend on the
specific call sequence.
Lemme try with only caching the row key.
> New Cell implementation with cached component offsets/lengths
> -------------------------------------------------------------
>
> Key: HBASE-13448
> URL: https://issues.apache.org/jira/browse/HBASE-13448
> Project: HBase
> Issue Type: Sub-task
> Components: Scanners
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch,
> HBASE-13448_V3.patch, gc.png, hits.png
>
>
> This can be extension to KeyValue and can be instantiated and used in read
> path.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)