[jira] [Updated] (SPARK-20663) Data missing after insert overwrite table partition which is created on specific location

kobefeng (JIRA) Thu, 18 May 2017 12:05:27 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


kobefeng updated SPARK-20663:
-----------------------------
    Labels:   (was: easyfix)

> Data missing after insert overwrite table partition which is created on 
> specific location
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-20663
>                 URL: https://issues.apache.org/jira/browse/SPARK-20663
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: kobefeng
>
> Use spark sql to create partition table first, and alter table by adding 
> partition on specific location, then insert overwrite into this partition by 
> selection, which will cause data missing compared with HIVE.
> {code:title=partition_table_insert_overwrite.sql|borderStyle=solid}
> -- create partition table first
> $ hadoop fs -mkdir /user/kofeng/partitioned_table
> $ /apache/spark-2.1.0-bin-hadoop2.7/bin/spark-sql
> spark-sql> create table kofeng.partitioned_table(
>          >     id bigint,
>          >     name string,
>          >     dt string
>          > ) using parquet options ('compression'='snappy', 
> 'path'='/user/kofeng/partitioned_table')
>          > partitioned by (dt);
> -- add partition with specific location
> spark-sql> alter table kofeng.partitioned_table add if not exists 
> partition(dt='20170507') location '/user/kofeng/partitioned_table/20170507';
> $ hadoop fs -ls /user/kofeng/partitioned_table
> drwxr-xr-x   - kofeng kofeng          0 2017-05-08 17:00 
> /user/kofeng/partitioned_table/20170507
> -- insert overwrite this partition, and the specific location folder gone, 
> data is missing, job is success by attaching _SUCCESS
> spark-sql> insert overwrite table kofeng.partitioned_table 
> partition(dt='20170507') select 123 as id, "kofeng" as name;
> $ hadoop fs -ls /user/kofeng/partitioned_table
> -rw-r--r--   3 kofeng kofeng         0 2017-05-08 17:06 
> /user/kofeng/partitioned_table/_SUCCESS
> ----
> ----
> -- Then drop this partition and use hive to add partition and insert 
> overwrite this partition data, then verify:
> spark-sql> alter table kofeng.partitioned_table drop if exists 
> partition(dt='20170507');
> hive> alter table kofeng.partitioned_table add if not exists 
> partition(dt='20170507') location '/user/kofeng/partitioned_table/20170507';
> OK
> -- could see hive insert overwrite that specific location successfully.
> hive> insert overwrite table kofeng.partitioned_table 
> partition(dt='20170507') select 123 as id, "kofeng" as name from kofeng.test;
> hive> select * from kofeng.partitioned_table;
> OK
> 123   kofeng  20170507
> $ hadoop fs -ls /user/kofeng/partitioned_table/20170507
> -rwxr-xr-x   3 kofeng kofeng       338 2017-05-08 17:26 
> /user/kofeng/partitioned_table/20170507/000000_0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-20663) Data missing after insert overwrite table partition which is created on specific location

Reply via email to