[
https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601075#comment-15601075
]
ramkrishna.s.vasudevan commented on HBASE-16931:
------------------------------------------------
Checking this patch.
Fix looks ok but one question -since we call shipped() after a batch of cells
are written - to avoid OOME because during compaction we hold all the blocks
till the compaction is completed. So to avoid that we call shipped(). But
because we do shipped() there is a chance that the blocks are cleared and in
write flow we hold on to 'lastCell' etc so those could get corrupted when the
block got released.
So we added beforeShipped() called. Now even before this bug was there in read
path even in write path we will end up in the same problem right.
The lastCell in write path just before the cleanSeqId started happening will
have a seqId but now the next Cell will become 0. So it is going to be problem
in Writer#checkKey() method I believe.
One more question - after append() immediately cant we again set back the
lastSeqId?
> Setting cell's seqId to zero in compaction flow might cause RS down.
> --------------------------------------------------------------------
>
> Key: HBASE-16931
> URL: https://issues.apache.org/jira/browse/HBASE-16931
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.0.0
> Reporter: binlijin
> Assignee: binlijin
> Priority: Critical
> Attachments: HBASE-16931-master.patch
>
>
> Compactor#performCompaction
> do {
> hasMore = scanner.next(cells, scannerContext);
> // output to writer:
> for (Cell c : cells) {
> if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) {
> CellUtil.setSequenceId(c, 0);
> }
> writer.append(c);
> }
> cells.clear();
> } while (hasMore);
> scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the
> last cell still reference by StoreScanner.prevCell, so if cleanSeqId is
> called when the scanner.next call StoreScanner.checkScanOrder may throw
> exception and cause regionserver down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)