[ https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566404#comment-14566404 ]
Anoop Sam John commented on HBASE-13448: ---------------------------------------- @larsh thanks for the comments I was trying to explain why we won't see any improve as such in the test and especially in 0.98. Sorry if I was not clearly saying. Test have 1 CF and single file in that. Under StoreScanner KVHeap, we have only single file always and there is no comparison happening and no calls to getXXXOffset/Length there. There is get calls in StoreScanner (max 2 times) and then in SQM also we need component offset/length. But in SQM we dont do get calls on KeyValue to get offset/length. Instead we calculate there on parsing KV buffer on our own. (See code below). Then SQM is skipping these cells and so no further get calls on the cells. So in effect there is 2 times get call on rowLength and just one time on others. This makes it clear why no adv. In a real case where Cells are not skipped (and in trunk especially) there are many times call happen and mainly on rowLength. When ExplicitColTracker in use, there are calls to qualifier offset/length also many times. For other component length/offset, the keyLength is parsed frequently. If u see table in above comments you can see how many times each call happen on a single Cell. Those numbers are when cells are written back to client side so comes in all layes. But in that test also I had only 1 CF and one HFile. So when this is also getting more, there will be comparison op happening in 2 KVHeaps and so the calls will be more. (We no longer pass the byte[], offset, length into Comparators but instead pass Cell alone) So in case of trunk there will be adv we would see.. If you can give us your test, I will run it on trunk. {code} byte [] bytes = kv.getBuffer(); int offset = kv.getOffset(); int keyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT); offset += KeyValue.ROW_OFFSET; int initialOffset = offset; short rowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT); offset += Bytes.SIZEOF_SHORT; int ret = this.rowComparator.compareRows(row, this.rowOffset, this.rowLength, bytes, offset, rowLength); ... ... //Passing rowLength offset += rowLength; //Skipping family byte familyLength = bytes [offset]; offset += familyLength + 1; int qualLength = keyLength - (offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE; long timestamp = Bytes.toLong(bytes, initialOffset + keyLength - KeyValue.TIMESTAMP_TYPE_SIZE); ... ... byte type = bytes[initialOffset + keyLength - 1]; ... MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength, type); if (colChecker == MatchCode.INCLUDE) { ReturnCode filterResponse = ReturnCode.SKIP; // STEP 2: Yes, the column is part of the requested columns. Check if filter is present if (filter != null) { // STEP 3: Filter the key value and return if it filters out filterResponse = filter.filterKeyValue(kv); {code} > New Cell implementation with cached component offsets/lengths > ------------------------------------------------------------- > > Key: HBASE-13448 > URL: https://issues.apache.org/jira/browse/HBASE-13448 > Project: HBase > Issue Type: Sub-task > Components: Scanners > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch, > HBASE-13448_V3.patch, gc.png, hits.png > > > This can be extension to KeyValue and can be instantiated and used in read > path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)