[
https://issues.apache.org/jira/browse/HIVE-29348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohamed Ali updated HIVE-29348:
-------------------------------
Issue Type: Improvement (was: Bug)
> MoveTask fails during ACID insert with dynamic partition when partition value
> is NULL
> -------------------------------------------------------------------------------------
>
> Key: HIVE-29348
> URL: https://issues.apache.org/jira/browse/HIVE-29348
> Project: Hive
> Issue Type: Improvement
> Components: Hive, Tez
> Affects Versions: 3.1.3
> Reporter: Mohamed Ali
> Priority: Major
>
> *Description:*
> We encountered a failure while running an {{INSERT INTO … PARTITION}} query
> in Hive (running on Tez).
> The query completes most stages successfully, but fails near the end during a
> {{MoveTask}} with the following error:
>
>
> {{FAILED: Execution Error, return code 40000 from
> org.apache.hadoop.hive.ql.exec.MoveTask.
> java.io.FileNotFoundException:
> Filehdfs://<cluster>/warehouse/.../<table>/_tmp.delta_0064171_0064171_0001does
> not exist.
> (state=08S01, code=40000)}}
> Despite the failure, Hive prints:
>
>
> {{INFO: OK}}
> which makes it unclear whether the query succeeded or failed.
> The final result is that *no data is written to the target table.*
>
> FROM (
> SELECT *, SUBSTRING(end_time_str,1,8) AS observation_date
> FROM source_table
> WHERE LENGTH(SUBSTRING(end_time_str,1,8)) = 8
> ) base
> INSERT INTO stats_table PARTITION (year='YYYY', month='MM', stream='STREAM')
> SELECT job_exec_time, observation_date, COUNT(*)
> GROUP BY observation_date
> INSERT INTO target_table PARTITION (observation_date)
> SELECT col1, col2, col3, observation_date
> WHERE some_condition;
> As soon as the second INSERT executes, Hive produces a MoveTask failure.
> Observed Behavior
> Earlier stages (DEPENDENCY_COLLECTION, MOVE, etc.) succeed
> Hive loads the first target table successfully
> The second insert’s MoveTask attempts to read from a temporary delta directory
> (example: _tmp.delta_0064171_0064171_0001)
> That temporary directory does not exist
> MoveTask throws FileNotFoundException
> Hive prints INFO: OK which is misleading
> No rows are written to the final table
> Expected Behavior
> Hive should create required temporary directories before MoveTask
> OR
> Hive should fail earlier with a clear explanation
> Logs should not print INFO: OK if the query fails
> Request
> We request investigation of:
> Why temporary delta folder /_tmp.delta_* is missing during MoveTask
> Why Hive reports INFO: OK although the statement fails
> Whether this is a bug in MoveTask handling on partitioned inserts under Tez
--
This message was sent by Atlassian Jira
(v8.20.10#820010)