Jihoon Lee created HIVE-25669:
---------------------------------

             Summary: After Insert overwrite (managed table), the previous data 
of the table is not deleted
                 Key: HIVE-25669
                 URL: https://issues.apache.org/jira/browse/HIVE-25669
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 3.1.0
         Environment: 1. hadoop eco versions

  - hive : 3.1.0
  - Tez : 0.9.1
  - hdfs : 3.1.1

2. Table info

  - table name : test_t1    (*sample name)

  - table : Managed table

  - partitioning : X (non partition)

3. Table properties

  - transactional = true

  - transactional_properties = insert_only

  - bucketing_version = 2

  - auto.purge =  true / false  (*apply both)

 
            Reporter: Jihoon Lee


When insert overwrite table, 'auto.purge' does not seem to work well.
h2. Step1. Create table

create table test_t1 (

col1 string,

col2 string,

col3 string,

col4 string

)

 

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://nameservice1/user/hive/warehouse/st.db/test_ljh5'

TBLPROPERTIES (

'auto.purge'='{color:#de350b}*false*{color}',

'bucketing_version'='2',

'transactional'='true',

'transactional_properties'='insert_only')

 
h2. 2. Insert overwrite

 2-1)

insert overwrite table test_t1 

select * from origin_t1 limit 10000;

 2-2)

insert overwrite table test_t1 

select * from origin_t1 limit 20000;

  2-3)

insert overwrite table test_t1 

select * from origin_t1 limit 30000;

 
h2. 3. Check HDFS files

 - Hue file browser 

  
!https://mail.google.com/mail/u/0?ui=2&ik=10577dc09a&attid=0.1&permmsgid=msg-f:1715412595915827826&th=17ce5eaad6cb2a72&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9ygBFCoYIqI3etBmYvvRfg1l7ea2lSBC5QLxHMFhuWOh8f5u_JbzO2d65-t5I6v4Xxn9zF-ZKVya4uwIL_nDsELRTYiZ321XsPwqXzHZmG_HYA0wL3tAGLAN8&disp=emb!

 why aren't my old folders(base_0000009, base_00000010) deleted?

It's the same even if i set the setting to '*auto.purge=true*' and to 
'*auto.purge=false*'.

 

And I have referenced here. 

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML]
 * INSERT OVERWRITE will overwrite any existing data in the table or partition
 ** unless {{IF NOT EXISTS}} is provided for a partition (as of Hive 0.9.0).
 ** As of Hive 2.3.0 (HIVE-15880), if the table has 
[TBLPROPERTIES|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-listTableProperties]
 ("auto.purge"="true") the previous data of the table is not moved to Trash 
when INSERT OVERWRITE query is run against the table. This functionality is 
applicable only for managed tables (see [managed 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExternalTables])
 and is turned off when "auto.purge" property is unset or set to false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to