[ 
https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2296:
------------------------------

    Description: 
When INSERT INTO is run on a table with compressed output 
(hive.exec.compress.output=true) and existing files in the table, it may copy 
the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1

This causes corrupted output when doing a SELECT * on the table.
Correct behavior should be to pick a valid filename such as:
000000_0_copy_1.gz

  was:
When INSERT INTO is run on a table with compressed output 
(hive.exec.compress.output=true) and existing files in the table, it may copy 
the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1

Correct behavior should be to pick a valid filename


> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>         Attachments: hive-2296.1.patch
>
>
> When INSERT INTO is run on a table with compressed output 
> (hive.exec.compress.output=true) and existing files in the table, it may copy 
> the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to