[
https://issues.apache.org/jira/browse/CARBONDATA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhat Kashyap reassigned CARBONDATA-1387:
-------------------------------------------
Assignee: Prabhat Kashyap
> Incorrect partition creation while inserting data from another table
> --------------------------------------------------------------------
>
> Key: CARBONDATA-1387
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1387
> Project: CarbonData
> Issue Type: Bug
> Components: data-query
> Affects Versions: 1.2.0
> Environment: spark 2.1
> Reporter: Vandana Yadav
> Assignee: Prabhat Kashyap
>
> Incorrect partition creation while inserting data from another table.
> Description: While inserting data from another table no of rows in each
> partition remain same although no of rows in partitioned table get
> increased.(no of rows in each partition should also increase as we are
> inserting new data into the partitioned table.
> Steps to reproduce:
> 1) Create Partitioned table:
> CREATE TABLE uniqdata_part (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION
> string,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1
> int) PARTITIONED BY (DOB Timestamp) STORED BY 'org.apache.carbondata.format'
> TBLPROPERTIES ('PARTITION_TYPE'='RANGE','RANGE_INFO'='1971-01-01 01:00:03,
> 1972-01-01 01:00:03, 1974-01-01 01:00:03',"TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into the partitioned table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into
> table uniqdata_part OPTIONS('DELIMITER'=',' ,
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Create another table:
> CREATE TABLE uniqdata_1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES
> ("TABLE_BLOCKSIZE"= "256 MB")
> 4) Load data into this table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into
> table uniqdata_1 OPTIONS('DELIMITER'=',' ,
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 5)Execute Queries:
> a) show partitions query:
> show partitions uniqdata_part
> Output:
> 0, dob = DEFAULT
> 1, dob < 1971-01-01 01:00:03
> 2, 1971-01-01 01:00:03 <= dob < 1972-01-01 01:00:03
> 3, 1972-01-01 01:00:03 <= dob < 1974-01-01 01:00:03
> b) Query for row count in the partitioned table:
> select count(*) from uniqdata_part
> Output:
> count(1) |
> +-----------+--+
> | 2013
> c)query for row count in partition 0:
> select count(*) from uniqdata_part where dob >= '1974-01-01 01:00:03'
> Output:
> count(1) |
> +-----------+--+
> | 539
> d) query for row count in partition 1 :
> select count(*) from uniqdata_part where dob < '1971-01-01 01:00:03'
> Output:
> count(1) |
> +-----------+--+
> | 366
> e) query for row count in partition 3:
> select count(*) from uniqdata_part where dob >= '1971-01-01 01:00:03' and dob
> < '1972-01-01 01:00:03'
> Output:
> count(1) |
> +-----------+--+
> | 365
> f) query for row count in partition 4:
> select count(*) from uniqdata_part where dob >= '1972-01-01 01:00:03' and dob
> < '1974-01-01 01:00:03'
> Output:
> count(1) |
> +-----------+--+
> | 731
> g) Insert data in partitioned table through the normal table:
> insert into uniqdata_part select * from uniqdata_1;
> h) Query for row count in the partitioned table after insertion operation:
> select count(*) from uniqdata_part
> Output:
> count(1) |
> +-----------+--+
> | 4026
> i) Repeat queries from (c) to (f) for row count in the each partition.
> 6) Actual Result: it shows the same row count for each partition but the
> partitioned table has more rows in it.
> 7)Expected Result: No of rows in each partition should increase as no of rows
> increases in partitioned table
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)