[
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831223
]
ASF GitHub Bot logged work on HIVE-26758:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Dec/22 23:23
Start Date: 05/Dec/22 23:23
Worklog Time Spent: 10m
Work Description: sunchao commented on code in PR #3831:
URL: https://github.com/apache/hive/pull/3831#discussion_r1040208683
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java:
##########
@@ -1971,13 +1971,19 @@ public static Path createMoveTask(Task<?> currTask,
boolean chDir,
* 2. INSERT operation on full ACID table
*/
if (!isMmTable && !isDirectInsert) {
- // generate the temporary file
- // it must be on the same file system as the current destination
Context baseCtx = parseCtx.getContext();
- // Create the required temporary file in the HDFS location if the
destination
- // path of the FileSinkOperator table is a blobstore path.
- Path tmpDir =
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+ // Choose location of required temporary file
+ Path tmpDir = null;
+ if (hconf.getBoolVar(ConfVars.HIVE_USE_SCRATCHDIR_FOR_STAGING)) {
+ tmpDir =
baseCtx.getTempDirForInterimJobPath(fileSinkDesc.getDestPath());
+ } else {
+ tmpDir =
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+ }
+ DynamicPartitionCtx dpCtx = fileSinkDesc.getDynPartCtx();
+ if (dpCtx != null && dpCtx.getSPPath() != null) {
+ tmpDir = new Path(tmpDir, dpCtx.getSPPath());
Review Comment:
nit: 2 space indentation
##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -5629,6 +5629,10 @@ public static enum ConfVars {
"This is a performance optimization that forces the final
FileSinkOperator to write to the blobstore.\n" +
"See HIVE-15121 for details."),
+ HIVE_USE_SCRATCHDIR_FOR_STAGING("hive.use.scratchdir.for.staging", false,
+ "Use ${hive.exec.scratchdir} for query results instead of
${hive.exec.stagingdir}.\n" +
+ "This stages query results in ${hive.exec.scratchdir} before move
to final destination."),
Review Comment:
nit: move -> moving
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -2608,8 +2608,8 @@ private Partition loadPartitionInternal(Path loadPath,
Table tbl, Map<String, St
* See: HIVE-1707 and HIVE-2117 for background
*/
FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf());
- FileSystem loadPathFS = loadPath.getFileSystem(getConf());
- if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) {
+ FileSystem tblPathFS = tblDataLocationPath.getFileSystem(getConf());
+ if (FileUtils.equalsFileSystem(oldPartPathFS,tblPathFS)) {
Review Comment:
nit: space before `tblPathFs`?
Issue Time Tracking
-------------------
Worklog Id: (was: 831223)
Time Spent: 3.5h (was: 3h 20m)
> Allow use scratchdir for staging final job
> ------------------------------------------
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
> Issue Type: New Feature
> Components: Query Planning
> Affects Versions: 4.0.0-alpha-2
> Reporter: Yi Zhang
> Assignee: Yi Zhang
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the
> destination path <destination_dir>/<staging_dir>/
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write
> to S3 but it was long time ago and no activity.
>
> This is to allow final job to use hive.exec.scratchdir as the interim jobs,
> with a configuration
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem
> instead of remote filesystem for the staging.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)