[ https://issues.apache.org/jira/browse/HBASE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181449#comment-17181449 ]
Michael Stack commented on HBASE-24920: --------------------------------------- Do you think it would be an option on [http://hbase.apache.org/book.html#hfile_tool] [~timoha] ? > A tool to rewrite corrupted HFiles > ---------------------------------- > > Key: HBASE-24920 > URL: https://issues.apache.org/jira/browse/HBASE-24920 > Project: HBase > Issue Type: Brainstorming > Components: hbase-operator-tools > Reporter: Andrey Elenskiy > Priority: Major > > Typically I have been dealing with corrupted HFiles (due to loss of hdfs > blocks) by just removing them. However, It always seemed wasteful to throw > away the entire HFile (which can be hundreds of gigabytes), just because one > hdfs block is missing (128MB). > I think there's a possibility for a tool that can rewrite an HFile by > skipping corrupted blocks. > There can be multiple types of issues with hdfs blocks but any of them can be > treated as if the block doesn't exist: > 1. All the replicas can be lost > 2. The block can be corrupted due to some bug in hdfs (I've recently run into > HDFS-15186 by experimenting with EC). > At the simplest the tool can be a local mapreduce job (mapper only) with a > custom HFile reader input that can seek to next DATABLK to skip corrupted > hdfs blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005)