[jira] [Assigned] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table

Prabhat Kashyap (JIRA) Mon, 21 Aug 2017 05:08:52 -0700

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prabhat Kashyap reassigned CARBONDATA-1387:
-------------------------------------------

    Assignee: Prabhat Kashyap

> Incorrect partition creation while inserting data from another table
> --------------------------------------------------------------------
>
>                 Key: CARBONDATA-1387
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1387
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.2.0
>         Environment: spark 2.1
>            Reporter: Vandana Yadav
>            Assignee: Prabhat Kashyap
>
> Incorrect partition creation while inserting data from another table.
> Description: While inserting data from another table no of rows in each 
> partition remain same although no of rows in partitioned table get 
> increased.(no of rows in each partition should also increase as we are 
> inserting new data into the partitioned table.
> Steps to reproduce:
> 1) Create Partitioned table:
> CREATE TABLE uniqdata_part (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) PARTITIONED BY (DOB Timestamp) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES ('PARTITION_TYPE'='RANGE','RANGE_INFO'='1971-01-01 01:00:03, 
> 1972-01-01 01:00:03, 1974-01-01 01:00:03',"TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into the partitioned table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into 
> table uniqdata_part OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Create another table:
> CREATE TABLE uniqdata_1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB")
> 4) Load data into this table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into 
> table uniqdata_1 OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 5)Execute Queries:
> a) show partitions query:
> show partitions uniqdata_part
> Output:
> 0, dob = DEFAULT                                     
>  1, dob < 1971-01-01 01:00:03                         
>  2, 1971-01-01 01:00:03 <= dob < 1972-01-01 01:00:03  
>  3, 1972-01-01 01:00:03 <= dob < 1974-01-01 01:00:03
> b) Query for row count in the partitioned table:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 2013 
> c)query for row count in partition 0: 
> select count(*) from uniqdata_part where dob >= '1974-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 539 
> d) query for row count in partition 1 :
> select count(*) from uniqdata_part where dob < '1971-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 366   
> e) query for row count in partition 3:
> select count(*) from uniqdata_part where dob >= '1971-01-01 01:00:03' and dob 
> < '1972-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 365 
> f) query for row count in partition 4:
> select count(*) from uniqdata_part where dob >= '1972-01-01 01:00:03' and dob 
> < '1974-01-01 01:00:03'
> Output:
> count(1)  |
> +-----------+--+
> | 731
> g) Insert data in partitioned table through the normal table:
> insert into uniqdata_part select * from uniqdata_1;
> h) Query for row count in the partitioned table after insertion operation:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 4026  
> i) Repeat queries from (c) to (f) for row count in the each partition.
> 6) Actual Result: it shows the same row count for each partition but the 
> partitioned table has more rows in it.
> 7)Expected Result: No of rows in each partition should increase as no of rows 
> increases in partitioned table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table

Reply via email to