[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

ASF GitHub Bot (Jira) Mon, 05 Dec 2022 10:38:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831150
 ]


ASF GitHub Bot logged work on HIVE-26758:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/22 18:37
            Start Date: 05/Dec/22 18:37
    Worklog Time Spent: 10m 
      Work Description: yigress opened a new pull request, #3831:
URL: https://github.com/apache/hive/pull/3831

   ### What changes were proposed in this pull request?
   
   1. add a hive configuration hive.use.scratchdir.for.staging
   
   2. for native table, no-mm, no-direct-insert, no-acid, change dynamic 
partition staging directory layout from
   <dest_path>/<static_partition>/<staging_dir>/<dynamic_partition>
   to 
   <dest_path>/<staging_dir>/<static_partition>/<dynamic_partition>
   
   3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName, 
DynamicContext's sourcePath change from
   <dest_path>/{hive.exec.stagingdir}
   to
   <hive.exec.scratchdir>
   
   
   for example for query 
   insert into/overwrite table partition(year=2001, season) select...
   
   before the change, the FileSinkOperator conf has
   <table_path>/year=2001/.staging_dir/season=xxx
   
   after the change, it has
   <table_path>/.staging_dir/year=2001/season=xxx
   
   This change allow to swap <table_path> with another path such as  
<hive.exec.scratchdir>, and the moveTask will move into <table_path>
   
   ### Why are the changes needed?
   
   In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim 
job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir. 
https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch 
for staging dir for S3. 
   
   However for blobstorage where 'rename' is slow and no encryption, it can 
help performance to use scratchdir to staging query results and use the 
MoveTask to copy to blobstorage. This is especially true when there is 
FileMerge task.
   This may also help cross-filesystem when user wants to use local cluster 
filesystem to staging query results and move the results to destination 
filesystem.
   
   
   ### Does this PR introduce _any_ user-facing change?
   This adds a new hive configuration.
   
   
   ### How was this patch tested?
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 831150)
    Time Spent: 3h 10m  (was: 3h)

> Allow use scratchdir for staging final job
> ------------------------------------------
>
>                 Key: HIVE-26758
>                 URL: https://issues.apache.org/jira/browse/HIVE-26758
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Yi Zhang
>            Assignee: Yi Zhang
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path <destination_dir>/<staging_dir>/
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

Reply via email to