[
https://issues.apache.org/jira/browse/HBASE-29108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Dimiduk updated HBASE-29108:
---------------------------------
Status: Patch Available (was: Open)
> regionserver does not cleanup storefiles written to .tmp directory when
> failing to close the storefiles during compaction
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-29108
> URL: https://issues.apache.org/jira/browse/HBASE-29108
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.5.10
> Reporter: Benoit Sigoure
> Priority: Major
> Labels: pull-request-available
>
> Background:
> * When hbase performs a compaction, it writes the compaction result (1 or
> more storefiles) to a file in HDFS under
> {{/hbase/data/<namespace>/<table>/<region>/{*}.tmp{*}/<columnfamily>/<storefile>}}
> * Once the compaction succeeds, the storefile is renamed to
> {{/hbase/data/<namespace>/<table>/<region>/<columnfamily>/<storefile>}}
> (moved out of the .tmp directory to where storefiles should be stored and are
> read to serve client RPC's)
> * When compaction fails, in some cases cleanup is performed and the
> storefile under {{.tmp}} directory is cleaned up (deleted). However, in other
> cases the storefile is left to be under {{.tmp}} directory (e.g. when one of
> the datanodes where the storefile's last block was being written gets
> {{{}SIGKILL{}}}'ed)
> Problem
> * In certain cases, a storefile under {{.tmp}} will contain 2 corrupt block
> replica and 1 good block replica (e.g. two replicas can be corrupt due to
> reason {{GENSTAMP_MISMATCH}} - generation stamp differs AND/OR have a file
> length lower than the good replica's file lengths). Namenode will detect this
> block corruption and report it in its metrics
> * The corrupt blocks will remain corrupt and the good block replica will not
> be re-replicated to other datanodes to fix the corruption.
> * The storefile under {{.tmp}} remains "open" / under-construction by the
> regionserver.
> Impact
> * *No* visible impact on hbase clients (storefiles under {{.tmp}} are not
> read to return data to clients).
> * This can trip up alerts & monitoring (there are corrupt blocks being
> reported by namenode that do not fix themselves until regions are
> reopened/regionservers restart)
> * Decommissioning of datanodes can get blocked indefinitely (a block that
> contains a corrupt replica but is part of a file that is still open does not
> get re-replicated to other datanodes even if a good replica is available,
> thus the datanode that has the only good replica of a block cannot be
> decommissioned)
> Workaround
> * A region can be re-opened (e.g. by restarting a regionserver on which the
> region is open), which causes the region's {{.tmp}} directory to be deleted
> recursively once the region is opened again, removing all corrupt blocks and
> leftover storefiles.
>
> This bug report was written by Tomas Baltrunas at Arista.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)