Jihoon Lee created HIVE-25669: --------------------------------- Summary: After Insert overwrite (managed table), the previous data of the table is not deleted Key: HIVE-25669 URL: https://issues.apache.org/jira/browse/HIVE-25669 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Environment: 1. hadoop eco versions
- hive : 3.1.0 - Tez : 0.9.1 - hdfs : 3.1.1 2. Table info - table name : test_t1 (*sample name) - table : Managed table - partitioning : X (non partition) 3. Table properties - transactional = true - transactional_properties = insert_only - bucketing_version = 2 - auto.purge = true / false (*apply both) Reporter: Jihoon Lee When insert overwrite table, 'auto.purge' does not seem to work well. h2. Step1. Create table create table test_t1 ( col1 string, col2 string, col3 string, col4 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://nameservice1/user/hive/warehouse/st.db/test_ljh5' TBLPROPERTIES ( 'auto.purge'='{color:#de350b}*false*{color}', 'bucketing_version'='2', 'transactional'='true', 'transactional_properties'='insert_only') h2. 2. Insert overwrite 2-1) insert overwrite table test_t1 select * from origin_t1 limit 10000; 2-2) insert overwrite table test_t1 select * from origin_t1 limit 20000; 2-3) insert overwrite table test_t1 select * from origin_t1 limit 30000; h2. 3. Check HDFS files - Hue file browser !https://mail.google.com/mail/u/0?ui=2&ik=10577dc09a&attid=0.1&permmsgid=msg-f:1715412595915827826&th=17ce5eaad6cb2a72&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9ygBFCoYIqI3etBmYvvRfg1l7ea2lSBC5QLxHMFhuWOh8f5u_JbzO2d65-t5I6v4Xxn9zF-ZKVya4uwIL_nDsELRTYiZ321XsPwqXzHZmG_HYA0wL3tAGLAN8&disp=emb! why aren't my old folders(base_0000009, base_00000010) deleted? It's the same even if i set the setting to '*auto.purge=true*' and to '*auto.purge=false*'. And I have referenced here. [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML] * INSERT OVERWRITE will overwrite any existing data in the table or partition ** unless {{IF NOT EXISTS}} is provided for a partition (as of Hive 0.9.0). ** As of Hive 2.3.0 (HIVE-15880), if the table has [TBLPROPERTIES|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-listTableProperties] ("auto.purge"="true") the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see [managed tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExternalTables]) and is turned off when "auto.purge" property is unset or set to false. -- This message was sent by Atlassian Jira (v8.3.4#803005)