anoopsjohn commented on pull request #3318:
URL: https://github.com/apache/hbase/pull/3318#issuecomment-852002682
So the latest version change is moving the check for possible shipped()
called out of the for loop. The cons here is that when we have wide rows with
so many cells in it, this will delay the process of release of blocks and so
release from cache. That is why it was kept in for loop.
But the bug is also a big concern.
Now seeing why and when we use this 'lastCleanCell'. This is to reset the
seqId on a cell. We set that to be 0 while we pass the cell to the writer (the
output writer for the new compacted file(s)). But we try reset that value on
the original cell (See details in HBASE-16931).
PrivateCellUtil.setSequenceId(lastCleanCell, lastCleanCellSeqId) is being
done inside the for loop before doing shipped call as well as outside the for
loop. If the call is happening only within the for() loop, there is no
chance of the corruption. But now it can so happen that the call can happen on
SAME cell object inside and outside of the loop.
So if we keep the code as is (not moving the shipped call out of for loop)
and do like
if (kvs != null && bytesWrittenProgressForShippedCall >
shippedCallSizeLimit) {
if (lastCleanCell != null) {
// HBASE-16931, set back sequence id to avoid affecting scan
order unexpectedly.
// ShipperListener will do a clone of the last cells it refer,
so need to set back
// sequence id before ShipperListener.beforeShipped
PrivateCellUtil.setSequenceId(lastCleanCell,
lastCleanCellSeqId);
**lastCleanCell = null; // The reset of the seqId for this
cell object happened already. Just nullify it**
}
We wont get to any issue. Correct? Seems that will be the best way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]