[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

ASF GitHub Bot (Jira) Mon, 05 Dec 2022 15:24:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831223
 ]


ASF GitHub Bot logged work on HIVE-26758:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/22 23:23
            Start Date: 05/Dec/22 23:23
    Worklog Time Spent: 10m 
      Work Description: sunchao commented on code in PR #3831:
URL: https://github.com/apache/hive/pull/3831#discussion_r1040208683


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java:
##########
@@ -1971,13 +1971,19 @@ public static Path createMoveTask(Task<?> currTask, 
boolean chDir,
        * 2. INSERT operation on full ACID table
        */
       if (!isMmTable && !isDirectInsert) {
-        // generate the temporary file
-        // it must be on the same file system as the current destination
         Context baseCtx = parseCtx.getContext();
 
-        // Create the required temporary file in the HDFS location if the 
destination
-        // path of the FileSinkOperator table is a blobstore path.
-        Path tmpDir = 
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+        // Choose location of required temporary file
+        Path tmpDir = null;
+        if (hconf.getBoolVar(ConfVars.HIVE_USE_SCRATCHDIR_FOR_STAGING)) {
+          tmpDir = 
baseCtx.getTempDirForInterimJobPath(fileSinkDesc.getDestPath());
+        } else {
+          tmpDir = 
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+        }
+        DynamicPartitionCtx dpCtx = fileSinkDesc.getDynPartCtx();
+        if (dpCtx != null && dpCtx.getSPPath() != null) {
+            tmpDir = new Path(tmpDir, dpCtx.getSPPath());

Review Comment:
   nit: 2 space indentation



##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -5629,6 +5629,10 @@ public static enum ConfVars {
             "This is a performance optimization that forces the final 
FileSinkOperator to write to the blobstore.\n" +
             "See HIVE-15121 for details."),
 
+    HIVE_USE_SCRATCHDIR_FOR_STAGING("hive.use.scratchdir.for.staging", false,
+        "Use ${hive.exec.scratchdir} for query results instead of 
${hive.exec.stagingdir}.\n" +
+            "This stages query results in ${hive.exec.scratchdir} before move 
to final destination."),

Review Comment:
   nit: move -> moving



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -2608,8 +2608,8 @@ private Partition loadPartitionInternal(Path loadPath, 
Table tbl, Map<String, St
            * See: HIVE-1707 and HIVE-2117 for background
            */
           FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf());
-          FileSystem loadPathFS = loadPath.getFileSystem(getConf());
-          if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) {
+          FileSystem tblPathFS = tblDataLocationPath.getFileSystem(getConf());
+          if (FileUtils.equalsFileSystem(oldPartPathFS,tblPathFS)) {

Review Comment:
   nit: space before `tblPathFs`?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 831223)
    Time Spent: 3.5h  (was: 3h 20m)

> Allow use scratchdir for staging final job
> ------------------------------------------
>
>                 Key: HIVE-26758
>                 URL: https://issues.apache.org/jira/browse/HIVE-26758
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Yi Zhang
>            Assignee: Yi Zhang
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path <destination_dir>/<staging_dir>/
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

Reply via email to