[
https://issues.apache.org/jira/browse/HBASE-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448435#comment-13448435
]
Hadoop QA commented on HBASE-6630:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543348/6590-seq-id-bulk-load.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 15 new or modified tests.
+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.
-1 javadoc. The javadoc tool appears to have generated 110 warning
messages.
-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).
-1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9)
warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
-1 core tests. The patch failed these unit tests:
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2784//console
This message is automatically generated.
> Port HBASE-6590 to trunk 0.94 : Assign sequence number to bulk loaded files
> ---------------------------------------------------------------------------
>
> Key: HBASE-6630
> URL: https://issues.apache.org/jira/browse/HBASE-6630
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 0.94.1
> Reporter: Amitanand Aiyer
> Assignee: Amitanand Aiyer
> Priority: Minor
> Attachments: 6590-seq-id-bulk-load.txt
>
>
> Currently bulk loaded files are not assigned a sequence number. Thus, they
> can only be used to import historical data, dating to the past. There are
> cases where we want to bulk load "current data"; but the bulk load mechanism
> does not support this, as the bulk loaded files are always sorted behind the
> non-bulkloaded hfiles. Assigning Sequence Id to bulk loaded files should
> solve this issue.
> StoreFiles within a store are sorted based on the sequenceId. SequenceId is a
> monotonically increasing number that accompanies every edit written to the
> WAL. For entries that update the same cell, we would like the latter edit to
> win. This comparision is accomplished using memstoreTS, at the KV level; and
> sequenceId at the StoreFile level (to order scanners in the KeyValueHeap).
> BulkLoaded files are generated outside of HBase/RegionServer, so they do not
> have a sequenceId written in the file. This causes HBase to lose track of the
> point in time, when the BulkLoaded file was imported to HBase. Resulting in a
> behavior, that *only* supports viewing bulkLoaded files as files back-filling
> data from the begining of time.
> By assigning a sequence number to the file, we can allow the bulk loaded file
> to fit in where we want. Either at the "current time" or the "begining of
> time". The latter is the default, to maintain backward compatibility.
> Design approach:
> Store files keep track of the sequence Id in the trailer. Since we do not
> wish to edit/rewrite the bulk loaded file upon import, we will encode the
> assigned sequenceId into the fileName. The filename RegEx is updated for this
> regard. If the sequenceId is encoded in the filename, the sequenceId will be
> used as the sequenceId for the file. If none is found, the sequenceId will be
> considered 0 (as per the default, backward-compatible behavior).
> To enable clients to request pre-existing behavior, the command line utility
> allows for 2 ways to import BulkLoaded Files: to assign or not assign a
> sequence Number.
> If a sequence Number is assigned, the imporeted file will be imported
> with the "current sequence Id".
> if the sequence Number is not assigned, it will be as if it was
> backfilling old data, from the begining of time.
> Compaction behavior:
> With the current compaction algorithm, bulk loaded files – that backfill
> data, to the begining of time – can cause a compaction storm, converting
> every minor compaction to a major compaction. To address this, these files
> are excluded from minor compaction, based on a config param. (enabled for the
> messages use case).
> Since, bulk loaded files that are not back-filling data do not cause this
> issue, they will not be ignored during minor compactions based on the config
> parameter. This is also required to ensure that there are no holes in the set
> of files selected for compaction – this is necessary to preserve the order of
> KV's comparision before and after compaction.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira