[
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vihang Karajgaonkar updated HIVE-15880:
---------------------------------------
Description:
It seems inconsistent that auto.purge property is not considered when we do a
INSERT OVERWRITE while it is when we do a DROP TABLE
Drop table doesn't move table data to Trash when auto.purge is set to true
{noformat}
> create table temp(col1 string, col2 string);
No rows affected (0.064 seconds)
> alter table temp set tblproperties('auto.purge'='true');
No rows affected (0.083 seconds)
> insert into temp values ('test', 'test'), ('test2', 'test2');
No rows affected (25.473 seconds)
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 22 2017-02-09 13:03
/user/hive/warehouse/temp/000000_0
#
> drop table temp;
No rows affected (0.242 seconds)
# hdfs dfs -ls /user/hive/warehouse/temp
ls: `/user/hive/warehouse/temp': No such file or directory
#
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
#
{noformat}
INSERT OVERWRITE query moves the table data to Trash even when auto.purge is
set to true
{noformat}
> create table temp(col1 string, col2 string);
> alter table temp set tblproperties('auto.purge'='true');
> insert into temp values ('test', 'test'), ('test2', 'test2');
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 22 2017-02-09 13:07
/user/hive/warehouse/temp/000000_0
#
> insert overwrite table temp select * from dummy;
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 26 2017-02-09 13:08
/user/hive/warehouse/temp/000000_0
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
Found 1 items
drwx------ - hive hive 0 2017-02-09 13:08
/user/hive/.Trash/Current/user/hive/warehouse/temp
#
{noformat}
While move operations are not very costly on HDFS it could be significant
overhead on slow FileSystems like S3. This could improve the performance of
{{INSERT OVERWRITE TABLE}} queries especially when there are large number of
partitions on tables located on S3 should the user wish to set auto.purge
property to true
Similarly {{TRUNCATE TABLE }} query on a table with {{auto.purge}} property set
true should not move the data to Trash
was:
It seems inconsistent that auto.purge property is not considered when we do a
INSERT OVERWRITE while it is when we do a DROP TABLE
Drop table doesn't move table data to Trash when auto.purge is set to true
{noformat}
> create table temp(col1 string, col2 string);
No rows affected (0.064 seconds)
> alter table temp set tblproperties('auto.purge'='true');
No rows affected (0.083 seconds)
> insert into temp values ('test', 'test'), ('test2', 'test2');
No rows affected (25.473 seconds)
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 22 2017-02-09 13:03
/user/hive/warehouse/temp/000000_0
#
> drop table temp;
No rows affected (0.242 seconds)
# hdfs dfs -ls /user/hive/warehouse/temp
ls: `/user/hive/warehouse/temp': No such file or directory
#
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
#
{noformat}
INSERT OVERWRITE query moves the table data to Trash even when auto.purge is
set to true
{noformat}
> create table temp(col1 string, col2 string);
> alter table temp set tblproperties('auto.purge'='true');
> insert into temp values ('test', 'test'), ('test2', 'test2');
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 22 2017-02-09 13:07
/user/hive/warehouse/temp/000000_0
#
> insert overwrite table temp select * from dummy;
# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt 3 hive hive 26 2017-02-09 13:08
/user/hive/warehouse/temp/000000_0
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
Found 1 items
drwx------ - hive hive 0 2017-02-09 13:08
/user/hive/.Trash/Current/user/hive/warehouse/temp
#
{noformat}
While move operations are not very costly on HDFS it could be significant
overhead on slow FileSystems like S3. This could improve the performance of
{{INSERT OVERWRITE TABLE}} queries especially when there are large number of
partitions on tables located on S3 should the user wish to set auto.purge
property to true
> Allow insert overwrite and truncate table query to use auto.purge table
> property
> --------------------------------------------------------------------------------
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
> Issue Type: Improvement
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Attachments: HIVE-15880.01.patch, HIVE-15880.02.patch,
> HIVE-15880.03.patch, HIVE-15880.04.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt 3 hive hive 22 2017-02-09 13:03
> /user/hive/warehouse/temp/000000_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt 3 hive hive 22 2017-02-09 13:07
> /user/hive/warehouse/temp/000000_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt 3 hive hive 26 2017-02-09 13:08
> /user/hive/warehouse/temp/000000_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx------ - hive hive 0 2017-02-09 13:08
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant
> overhead on slow FileSystems like S3. This could improve the performance of
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of
> partitions on tables located on S3 should the user wish to set auto.purge
> property to true
> Similarly {{TRUNCATE TABLE }} query on a table with {{auto.purge}} property
> set true should not move the data to Trash
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)