[
https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101714#comment-14101714
]
Jerry He commented on HBASE-11772:
----------------------------------
This is a local 0.98 test run:
{code}
[INFO] Reactor Summary:
[INFO]
[INFO] HBase ............................................. SUCCESS [1.821s]
[INFO] HBase - Common .................................... SUCCESS [22.919s]
[INFO] HBase - Protocol .................................. SUCCESS [7.192s]
[INFO] HBase - Client .................................... SUCCESS [38.781s]
[INFO] HBase - Hadoop Compatibility ...................... SUCCESS [6.506s]
[INFO] HBase - Hadoop Two Compatibility .................. SUCCESS [1.540s]
[INFO] HBase - Prefix Tree ............................... SUCCESS [3.170s]
[INFO] HBase - Server .................................... SUCCESS [39:10.623s]
[INFO] HBase - Testing Util .............................. SUCCESS [1.776s]
[INFO] HBase - Thrift .................................... SUCCESS [1:42.993s]
[INFO] HBase - Shell ..................................... SUCCESS [1.091s]
[INFO] HBase - Integration Tests ......................... SUCCESS [1.329s]
[INFO] HBase - Examples .................................. SUCCESS [1.468s]
[INFO] HBase - Assembly .................................. SUCCESS [0.930s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 42:22.873s
[INFO] Finished at: Mon Aug 18 18:03:59 PDT 2014
{code}
> Bulk load mvcc and seqId issues with native hfiles
> --------------------------------------------------
>
> Key: HBASE-11772
> URL: https://issues.apache.org/jira/browse/HBASE-11772
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.5
> Reporter: Jerry He
> Assignee: Jerry He
> Fix For: 0.98.6
>
> Attachments: HBASE-11772-0.98.patch
>
>
> There are mvcc and seqId issues when bulk load native hfiles -- meaning
> hfiles that are direct file copy-out from hbase, not from HFileOutputFormat
> job.
> There are differences between these two types of hfiles.
> Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero
> mvcc values in cells.
> Native hfiles also have MAX_SEQ_ID_KEY.
> Native hfiles do not have BULKLOAD_TIME_KEY.
> Here are a couple of problems I observed when bulk load native hfiles.
> 1. Cells in newly bulk loaded hfiles can be invisible to scan.
> It is easy to re-create.
> Bulk load a native hfile that has a larger mvcc value in cells, e.g 10
> If the current readpoint when initiating a scan is less than 10, the cells in
> the new hfile are skipped, thus become invisible.
> We don't reset the readpoint of a region after bulk load.
> 2. The current StoreFile.isBulkLoadResult() is implemented as:
> {code}
> return metadataMap.containsKey(BULKLOAD_TIME_KEY)
> {code}
> which does not detect bulkloaded native hfiles.
> 3. Another observed problem is possible data loss during log recovery.
> It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create
> steps from HBASE-10958.
> 1) Create an empty table
> 2) Put one row in it (let's say it gets seqid 1)
> 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile
> can be obtained by copying out from existing table.
> 4) Kill the region server that holds the table's region.
> Scan the table once the region is made available again. The first row, at
> seqid 1, will be missing since the HFile with seqid 100 makes us believe that
> everything that came before it was flushed.
> The problem 3 is probably related to 2. We will be ok if we get the appended
> seqId during bulk load instead of 100 from inside the file.
--
This message was sent by Atlassian JIRA
(v6.2#6252)