[ 
https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-22941:
--------------------------------
    Fix Version/s: 4.0.0

> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>
>                 Key: HIVE-22941
>                 URL: https://issues.apache.org/jira/browse/HIVE-22941
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-22941.01.patch
>
>
> There were multiple patches targeting an issue when INSERT OVERWRITE was 
> ineffective if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before 
> overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input 
> is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the 
> input is empty
> From these patches, HIVE-21714 seems to have a bad effect on external tables, 
> because of this part:
> https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268
> The original issue before HIVE-21714 was that the original files in the table 
> survived an insert overwrite, and select(*)>0 was after that. HIVE-21714 
> seems to enable writing empty files regardless of execution engine / table 
> type, which is not the proper way, as the proper solution would be to 
> completely avoid writing empty files for Tez (this is what HIVE-14014 was 
> about). I found that changing condition to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases 
> (both full ACID and MM) in insert_overwrite.q, which could mean they rely 
> somehow on the empty generated file. We need to find a proper solution which 
> is applicable for all table types without polluting external tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to