[
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236558#comment-15236558
]
Daniel Dai commented on HIVE-13429:
-----------------------------------
If hive.scratchdir.lock is true, Hive will put a lock file into scratch dir to
indicate the scratch dir is in use. If the Hive process is live, the lock file
is keep open and cleardanglingscratchdir will detect it and not treat it as a
dangling scratch dir. Only when the Hive process die accidentally and leave the
scratch dir behind, it will become the target for cleardanglingscratchdir to
remove.
For backward compatibility, cleardanglingscratchdir will not remove any dir
does not contain lock file at all (must be generated by old version of Hive).
> Tool to remove dangling scratch dir
> -----------------------------------
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
> Issue Type: Improvement
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Labels: TODOC1.3, TODOC2.1
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch,
> HIVE-13429.3.patch, HIVE-13429.4.patch, HIVE-13429.5.patch,
> HIVE-13429.branch-1.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and
> eventually eat out hdfs storage. This could happen when vm restarts and leave
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli
> and HiveServer2. Here we provide an external tool to clear dead scratch dir
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding
> this file is still running, we will get exception. Otherwise, we know the
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the
> lease after 10 min, ie, the HDFS file hold by the dead client is writable
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir will check every scratch
> directory and try open the lock file for write. If it does not get exception,
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but
> we still cannot reclaim the scratch directory for another 10 min. But this
> should be tolerable
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)