[jira] [Updated] (HBASE-29108) regionserver does not cleanup storefiles written to .tmp directory when failing to close the storefiles during compaction

Nick Dimiduk (Jira) Mon, 17 Feb 2025 08:17:39 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-29108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nick Dimiduk updated HBASE-29108:
---------------------------------
    Status: Patch Available  (was: Open)

> regionserver does not cleanup storefiles written to .tmp directory when 
> failing to close the storefiles during compaction
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29108
>                 URL: https://issues.apache.org/jira/browse/HBASE-29108
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.5.10
>            Reporter: Benoit Sigoure
>            Priority: Major
>              Labels: pull-request-available
>
> Background:
>  * When hbase performs a compaction, it writes the compaction result (1 or 
> more storefiles) to a file in HDFS under 
> {{/hbase/data/<namespace>/<table>/<region>/{*}.tmp{*}/<columnfamily>/<storefile>}}
>  * Once the compaction succeeds, the storefile is renamed to 
> {{/hbase/data/<namespace>/<table>/<region>/<columnfamily>/<storefile>}} 
> (moved out of the .tmp directory to where storefiles should be stored and are 
> read to serve client RPC's)
>  * When compaction fails, in some cases cleanup is performed and the 
> storefile under {{.tmp}} directory is cleaned up (deleted). However, in other 
> cases the storefile is left to be under {{.tmp}} directory (e.g. when one of 
> the datanodes where the storefile's last block was being written gets 
> {{{}SIGKILL{}}}'ed)
> Problem
>  * In certain cases, a storefile under {{.tmp}} will contain 2 corrupt block 
> replica and 1 good block replica (e.g. two replicas can be corrupt due to 
> reason {{GENSTAMP_MISMATCH}} - generation stamp differs AND/OR have a file 
> length lower than the good replica's file lengths). Namenode will detect this 
> block corruption and report it in its metrics
>  * The corrupt blocks will remain corrupt and the good block replica will not 
> be re-replicated to other datanodes to fix the corruption.
>  * The storefile under {{.tmp}} remains "open" / under-construction by the 
> regionserver.
> Impact
>  * *No* visible impact on hbase clients (storefiles under {{.tmp}} are not 
> read to return data to clients).
>  * This can trip up alerts & monitoring (there are corrupt blocks being 
> reported by namenode that do not fix themselves until regions are 
> reopened/regionservers restart)
>  * Decommissioning of datanodes can get blocked indefinitely (a block that 
> contains a corrupt replica but is part of a file that is still open does not 
> get re-replicated to other datanodes even if a good replica is available, 
> thus the datanode that has the only good replica of a block cannot be 
> decommissioned)
> Workaround
>  * A region can be re-opened (e.g. by restarting a regionserver on which the 
> region is open), which causes the region's {{.tmp}} directory to be deleted 
> recursively once the region is opened again, removing all corrupt blocks and 
> leftover storefiles.
>  
> This bug report was written by Tomas Baltrunas at Arista.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-29108) regionserver does not cleanup storefiles written to .tmp directory when failing to close the storefiles during compaction

Reply via email to