Andrey Elenskiy created HBASE-24920:
---------------------------------------

             Summary: A tool to rewrite corrupted HFiles
                 Key: HBASE-24920
                 URL: https://issues.apache.org/jira/browse/HBASE-24920
             Project: HBase
          Issue Type: Brainstorming
          Components: hbase-operator-tools
            Reporter: Andrey Elenskiy


Typically I have been dealing with corrupted HFiles (due to loss of hdfs 
blocks) by just removing them. However, It always seemed wasteful to throw away 
the entire HFile (which can be hundreds of gigabytes), just because one hdfs 
block is missing (128MB).

I think there's a possibility for a tool that can rewrite an HFile by skipping 
corrupted blocks. 

There can be multiple types of issues with hdfs blocks but any of them can be 
treated as if the block doesn't exist:
1. All the replicas can be lost
2. The block can be corrupted due to some bug in hdfs (I've recently run into 
HDFS-15186 by experimenting with EC).

At the simplest the tool can be a local mapreduce job (mapper only) with a 
custom HFile reader input that can seek to next DATABLK to skip corrupted hdfs 
blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to