[ 
https://issues.apache.org/jira/browse/HBASE-27649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693689#comment-17693689
 ] 

Hudson commented on HBASE-27649:
--------------------------------

Results for branch master
        [build #783 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/783/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WALPlayer does not properly dedupe overridden cell versions
> -----------------------------------------------------------
>
>                 Key: HBASE-27649
>                 URL: https://issues.apache.org/jira/browse/HBASE-27649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>             Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> If you do 2 Puts to a cell with different values but the same timestamp, the 
> latest one will win. This is because in the memstore we use a sequenceId as a 
> tie breaker for duplicate timestamps. When the data is flushed to a 
> StoreFile, the deduplication will occur and eventually the sequenceId will be 
> dropped.
> Those 2 Puts would have been added to the WAL, and if you use WALPlayer to 
> replay those WALs (as anyone could do, but also as backup/restore does for 
> incremental restores) it will not properly do the same thing. It's unclear 
> which of the duplicate cells you will get, when you should always get the 
> latest.
> Our WAL encoder doesn't include the sequenceIds in the WALEntry cells. 
> Instead the WALKey has a getSequenceId() which contains the same sequenceId 
> the cells used to have. In WALCellMapper we don't pass those along, nor in 
> CellSerialization, and thus CellSortReducer is not able to use the sequenceId 
> to dedupe.
> I think we just need to translate the WALKey.getSequenceId() into the output 
> Cells in WALCellMapper, then update CellSerialization to include them as 
> well. At that point CellSortReducer should work as expected, and we should 
> get the correct cell values in the hfiles.
> One open question is whether we should clear out the sequenceId before 
> flushing to the hfile. I don't think so?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to