[
https://issues.apache.org/jira/browse/IMPALA-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922218#comment-17922218
]
ASF subversion and git services commented on IMPALA-13677:
----------------------------------------------------------
Commit a159eb52f8d3efda5223dfa4f7a9eced5ce48d77 in impala's branch
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a159eb52f ]
IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup
This patch introduces a new feature for cleaning up remote scratch
files during Impala daemon startup, ensuring that potential leftover
files from abnormal shutdowns are removed.
To allow efficient cleanup, this patch also refines the remote
scratch directory hierarchy by adding a host-level directory,
changing it from:
<base_dir>/<backend_id>_<query_id>/<file_name>
to:
<base_dir>/<hostname>/<backend_id>_<query_id>/<file_name>
<base_dir> is <scratch_dir_config_path>/impala-scratch.
During startup, if the host-level directory exists, it will be
removed entirely.
This design assumes one Impala daemon per host, and it also
assumes that multiple Impala clusters don't share the same
scratch_dir path on remote filesystem. Even if they share the
same prefix, each Impala cluster should have dedicated paths:
--scratch_dirs=hdfs://remote_dir/scratch/impala1
--scratch_dirs=hdfs://remote_dir/scratch/impala2
Also added one flag remote_scratch_cleanup_on_startup to control
whether the host-level directory is cleaned during Impala daemon
startup. By default, this feature is enabled. If multiple daemons
on a host or multiple clusters share the same remote scratch_dir
path, we can set this to false to prevent unintended cleanup.
Tests:
Passed exhaustive tests.
Adds testcase test_scratch_dirs_remote_spill_leftover_files_removal.
Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b
Reviewed-on: http://gerrit.cloudera.org:8080/22378
Reviewed-by: Abhishek Rawat <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Cleanup of s3 scratch files on abnormal executor exit
> -----------------------------------------------------
>
> Key: IMPALA-13677
> URL: https://issues.apache.org/jira/browse/IMPALA-13677
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Yida Wu
> Assignee: Yida Wu
> Priority: Major
>
> Currently, when an executor spills data to a remote storage, scratch files
> would remain in the remote storage if the executor exits abnormally or is
> terminated after the graceful shutdown deadline.
> Immediate removal may be challenging, and no concrete solution is currently
> available. However, we may consider adding an additional thread in the
> coordinator to manage the cleanup of leftover scratch files in remote storage
> or consider alternative methods to ensure their safe and complete removal.
> Since s3 is the most common scenario, this task may specifically focus on
> handling the leftover scratch files in s3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]