[
https://issues.apache.org/jira/browse/HBASE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-24920.
-----------------------------------------
Assignee: (was: Andrey Elenskiy)
Resolution: Feedback Received
> A tool to rewrite corrupted HFiles
> ----------------------------------
>
> Key: HBASE-24920
> URL: https://issues.apache.org/jira/browse/HBASE-24920
> Project: HBase
> Issue Type: Brainstorming
> Components: hbase-operator-tools
> Reporter: Andrey Elenskiy
> Priority: Major
>
> Typically I have been dealing with corrupted HFiles (due to loss of hdfs
> blocks) by just removing them. However, It always seemed wasteful to throw
> away the entire HFile (which can be hundreds of gigabytes), just because one
> hdfs block is missing (128MB).
> I think there's a possibility for a tool that can rewrite an HFile by
> skipping corrupted blocks.
> There can be multiple types of issues with hdfs blocks but any of them can be
> treated as if the block doesn't exist:
> 1. All the replicas can be lost
> 2. The block can be corrupted due to some bug in hdfs (I've recently run into
> HDFS-15186 by experimenting with EC).
> At the simplest the tool can be a local mapreduce job (mapper only) with a
> custom HFile reader input that can seek to next DATABLK to skip corrupted
> hdfs blocks.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)