[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files

Matteo Bertozzi (JIRA) Wed, 26 Feb 2014 23:47:22 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914173#comment-13914173
 ]


Matteo Bertozzi commented on HBASE-10615:
-----------------------------------------

do you have an example of the cmd line that you want to use to bulk load after 
copying hbase.rootdir?
is still not clear to me how do you want to bulk load the table/tables

if you put all the files together and let LoadIncrementalHFile split based on 
key, 
yeah you probably can skip the reference files

same for HFileLink, if you bulk load the ones in the archive, you can skip the 
HFileLink
but you have to manually find which files do you need, otherwise you may bulk 
load old data from the archive which shouldn't be in the table anymore

if you can post the use example that you have tried it will be easy for me to 
verify that what I've said able the parent removed by the CatalogJanitor

> Make LoadIncrementalHFiles skip reference files
> -----------------------------------------------
>
>                 Key: HBASE-10615
>                 URL: https://issues.apache.org/jira/browse/HBASE-10615
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.96.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>         Attachments: HBASE-10615-trunk.patch
>
>
> There is use base that the source of hfiles for LoadIncrementalHFiles can be 
> a FileSystem copy-out/backup of HBase table or archive hfiles.  For example,
> 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or 
> archive dir.
> 2. ExportSnapshot
> It is possible that there are reference files in the family dir in these 
> cases.
> We have such use cases, where trying to load back into HBase, we'll get
> {code}
> Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem 
> reading HFile Trailer from file 
> hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393)
>         at 
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>         at java.lang.Thread.run(Thread.java:738)
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 
> 16715777 (expected to be between 2 and 2)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927)
>         at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568)
> {code}
> It is desirable and safe to skip these reference files since they don't 
> contain any real data for bulk load purpose.
>   



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files

Reply via email to