[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

Hadoop QA (JIRA) Fri, 11 Apr 2014 19:58:26 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967366#comment-13967366
 ]


Hadoop QA commented on HBASE-10958:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639893/HBASE-10958.patch
  against trunk revision .
  ATTACHMENT ID: 12639893

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9267//console

This message is automatically generated.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-10958
>                 URL: https://issues.apache.org/jira/browse/HBASE-10958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.2, 0.98.1, 0.94.18
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
>         Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

Reply via email to