yigress opened a new pull request, #3781:
URL: https://github.com/apache/hive/pull/3781
### What changes were proposed in this pull request?
1. add a hive configuration hive.use.scratchdir.for.staging
2. for native table, no-mm, no-direct-insert, no-acid, change dynamic
partition staging directory layout from
<dest_path>/<static_partition>/<staging_dir>/<dynamic_partition>
to
<dest_path>/<staging_dir>/<static_partition>/<dynamic_partition>
3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName,
DynamicContext's sourcePath change from
<dest_path>/{hive.exec.stagingdir}
to
<hive.exec.scratchdir>
### Why are the changes needed?
In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim
job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir.
https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch
for staging dir for S3.
However for blobstorage where 'rename' is slow and no encryption, it can
help performance to use scratchdir to staging query results and use the
MoveTask to copy to blobstorage. This is especially true when there is
FileMerge task.
This may also help cross-filesystem when user wants to use local cluster
filesystem to staging query results and move the results to destination
filesystem.
### Does this PR introduce _any_ user-facing change?
This adds a new hive configuration.
### How was this patch tested?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]