Hi:

In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
estimatedSerialiedSizeOfCell() by moving the general getSerializedSize()
and heapSize() from ExtendedCell to Cell interface. It's a incompatible
change in some case, such as if the upstream user implemented their
own Cells, although it's rare but can happen, then their compile will be
error.

We gain almost ~40% throughput improvement in 100% scan case for branch-2
(cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
whether the patch should go to branch-2.1 ?   in here [2], stack says
branch-2.0 won't need this Cell interface change (Agree, maybe the
following
change can be included, will file issue for it), but not quite sure for
branch-1 . Discussion are welcome (smile).

Anyway,  patch can be included to branch-2/master because we've not made a
release yet.

BTW, the patch also included some other improvments:
1.  for 99%  of case, our cells has no tags, so let the HFileScannerImpl
just return the NoTagsByteBufferKeyValue if no tags, which means we can
save
     lots of cpu time when sending no tags cell to rpc because can just
return the length instead of getting the serialize size by caculating
offset/length
     of each fields(row/cf/cq..)
2. Move the subclass's getSerializedSize implementation from ExtendedCell
to their own class, which mean we did not need to call ExtendedCell's
    getSerialiedSize() firstly, then forward to subclass's
getSerializedSize(withTags).
3.  Give a estimated result arraylist size for avoiding the frequent list
extension when in a big scan, now we estimate the array size as
min(scan.rows, 512).
     it's also help a lot.

Thanks.

1.
https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
2.
https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330

Reply via email to