yigress opened a new pull request, #3781:
URL: https://github.com/apache/hive/pull/3781

   
   ### What changes were proposed in this pull request?
   
   1. add a hive configuration hive.use.scratchdir.for.staging
   
   2. for native table, no-mm, no-direct-insert, no-acid, change dynamic 
partition staging directory layout from
   <dest_path>/<static_partition>/<staging_dir>/<dynamic_partition>
   to 
   <dest_path>/<staging_dir>/<static_partition>/<dynamic_partition>
   
   3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName, 
DynamicContext's sourcePath change from
   <dest_path>/{hive.exec.stagingdir}
   to
   <hive.exec.scratchdir>
   
   
   
   ### Why are the changes needed?
   
   In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim 
job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir. 
https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch 
for staging dir for S3. 
   
   However for blobstorage where 'rename' is slow and no encryption, it can 
help performance to use scratchdir to staging query results and use the 
MoveTask to copy to blobstorage. This is especially true when there is 
FileMerge task.
   This may also help cross-filesystem when user wants to use local cluster 
filesystem to staging query results and move the results to destination 
filesystem.
   
   
   ### Does this PR introduce _any_ user-facing change?
   This adds a new hive configuration.
   
   
   ### How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to