[
https://issues.apache.org/jira/browse/HBASE-27649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693689#comment-17693689
]
Hudson commented on HBASE-27649:
--------------------------------
Results for branch master
[build #783 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> WALPlayer does not properly dedupe overridden cell versions
> -----------------------------------------------------------
>
> Key: HBASE-27649
> URL: https://issues.apache.org/jira/browse/HBASE-27649
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Labels: patch-available
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> If you do 2 Puts to a cell with different values but the same timestamp, the
> latest one will win. This is because in the memstore we use a sequenceId as a
> tie breaker for duplicate timestamps. When the data is flushed to a
> StoreFile, the deduplication will occur and eventually the sequenceId will be
> dropped.
> Those 2 Puts would have been added to the WAL, and if you use WALPlayer to
> replay those WALs (as anyone could do, but also as backup/restore does for
> incremental restores) it will not properly do the same thing. It's unclear
> which of the duplicate cells you will get, when you should always get the
> latest.
> Our WAL encoder doesn't include the sequenceIds in the WALEntry cells.
> Instead the WALKey has a getSequenceId() which contains the same sequenceId
> the cells used to have. In WALCellMapper we don't pass those along, nor in
> CellSerialization, and thus CellSortReducer is not able to use the sequenceId
> to dedupe.
> I think we just need to translate the WALKey.getSequenceId() into the output
> Cells in WALCellMapper, then update CellSerialization to include them as
> well. At that point CellSortReducer should work as expected, and we should
> get the correct cell values in the hfiles.
> One open question is whether we should clear out the sequenceId before
> flushing to the hfile. I don't think so?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)