[
https://issues.apache.org/jira/browse/HIVE-28086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819984#comment-17819984
]
Butao Zhang commented on HIVE-28086:
------------------------------------
I have tested the *hive.exec.stagingdir,* and i think it can be set dynamically.
*Loading data to table testdb.teststage from
hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-10000*
{code:java}
0: jdbc:hive2://127.0.0.1:10000/default> create table teststage(id int);
0: jdbc:hive2://127.0.0.1:10000/default> set
hive.exec.stagingdir=/tmp/hive/.hive-staging;
0: jdbc:hive2://127.0.0.1:10004/default> insert into teststage values(123);
...
INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table testdb.teststage from
hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-10000
INFO : Completed executing
command(queryId=hive_20240223170518_68ee2a14-3268-4b34-bd01-ed0fe48a02ea); Time
taken: 5.977 seconds
1 row affected (6.535 seconds)
{code}
> Clean up older version staging-dir when versioning is enabled at storage like
> s3
> --------------------------------------------------------------------------------
>
> Key: HIVE-28086
> URL: https://issues.apache.org/jira/browse/HIVE-28086
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Reporter: Taraka Rama Rao Lethavadla
> Priority: Major
>
> When running Hive using AWS S3 Storage *with versioning* for managed /
> external table directories, a staging directory is created as per
> {{hive.exec.stagingdir=.hive-staging}} under the table location.
> The directory is deleted after the job is completed
> AWS S3 offers an option to enable versioning. When enabled, the Hive staging
> directories will be "deleted" but a copy will be kept. This requires manual
> cleanup and over time the effort to remove these directories will be too much
> of a work
> For Spark jobs, this is easily worked out by setting:
> {noformat}
> spark.hadoop.hive.exec.stagingdir=s3a://<bucketname>/tmp/hive/.hive-staging
> {noformat}
> However that is not an option in Hive because {{hive.exec.stagingdir}} is
> meant to be a relative path to the table directory
> Options to solve:
> It would be helpful to allow the staging directory to be configurable like
> these properties:
> {noformat}
> hive.exec.scratchdir=/tmp/hive
> hive.exec.local.scratchdir=/tmp/hive
> tez.staging-dir=/tmp/${user.name}/staging
> {noformat}
> That will allow customers to configure a location without versioning and
> avoid this usability issue.
> or
> use Storage API(eg: s3) to delete the old version of staging along with
> actual directory
> h4.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)