[ 
https://issues.apache.org/jira/browse/HIVE-29348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohamed Ali updated HIVE-29348:
-------------------------------
    Issue Type: Improvement  (was: Bug)

> MoveTask fails during ACID insert with dynamic partition when partition value 
> is NULL
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-29348
>                 URL: https://issues.apache.org/jira/browse/HIVE-29348
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, Tez
>    Affects Versions: 3.1.3
>            Reporter: Mohamed Ali
>            Priority: Major
>
> *Description:*
> We encountered a failure while running an {{INSERT INTO … PARTITION}} query 
> in Hive (running on Tez).
> The query completes most stages successfully, but fails near the end during a 
> {{MoveTask}} with the following error:
>  
>  
> {{FAILED: Execution Error, return code 40000 from 
> org.apache.hadoop.hive.ql.exec.MoveTask.
> java.io.FileNotFoundException: 
> Filehdfs://<cluster>/warehouse/.../<table>/_tmp.delta_0064171_0064171_0001does
>  not exist.
> (state=08S01, code=40000)}}
> Despite the failure, Hive prints:
>  
>  
> {{INFO: OK}}
> which makes it unclear whether the query succeeded or failed.
> The final result is that *no data is written to the target table.*
>  
> FROM (
>   SELECT *, SUBSTRING(end_time_str,1,8) AS observation_date
>   FROM source_table
>   WHERE LENGTH(SUBSTRING(end_time_str,1,8)) = 8
> ) base
> INSERT INTO stats_table PARTITION (year='YYYY', month='MM', stream='STREAM')
> SELECT job_exec_time, observation_date, COUNT(*)
> GROUP BY observation_date
> INSERT INTO target_table PARTITION (observation_date)
> SELECT col1, col2, col3, observation_date
> WHERE some_condition;
> As soon as the second INSERT executes, Hive produces a MoveTask failure.
> Observed Behavior
> Earlier stages (DEPENDENCY_COLLECTION, MOVE, etc.) succeed
> Hive loads the first target table successfully
> The second insert’s MoveTask attempts to read from a temporary delta directory
> (example: _tmp.delta_0064171_0064171_0001)
> That temporary directory does not exist
> MoveTask throws FileNotFoundException
> Hive prints INFO: OK which is misleading
> No rows are written to the final table
> Expected Behavior
> Hive should create required temporary directories before MoveTask
> OR
> Hive should fail earlier with a clear explanation
> Logs should not print INFO: OK if the query fails
> Request
> We request investigation of:
> Why temporary delta folder /_tmp.delta_* is missing during MoveTask
> Why Hive reports INFO: OK although the statement fails
> Whether this is a bug in MoveTask handling on partitioned inserts under Tez



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to