[
https://issues.apache.org/jira/browse/CARBONDATA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834391#comment-15834391
]
sandeep purohit commented on CARBONDATA-658:
--------------------------------------------
The compression of data depends on the difference between min value and max
value of the column and in above both the CSV's its difference is 99999 so for
both the CSV's it's select DATA_INT datatype for Compression, You should once
try sample1.csv for SmallBigInt then it will select the DATA_SHORT as the
datatype for compression. [~ravi.pesala] [~manishgupta88] Can you please
verify this.
> Compression is not working for BigInt and Int datatype
> ------------------------------------------------------
>
> Key: CARBONDATA-658
> URL: https://issues.apache.org/jira/browse/CARBONDATA-658
> Project: CarbonData
> Issue Type: Bug
> Components: data-load
> Affects Versions: 1.0.0-incubating
> Environment: spark 1.6, 2.0
> Reporter: Geetika Gupta
> Attachments: 100000_LargeBigInt.csv, 100000_LargeInt.csv,
> 100000_SmallBigInt.csv, 100000_SmallInt.csv, sample1.csv
>
>
> I tried to load data into a table having bigInt as a column. Firstly I loaded
> small bigint values to the table and noted down the carbondata file size then
> I loaded max bigint values to the table and again noted the carbondata file
> size.
> For large bigint values the carbondata file size was 684.25 Kb and for small
> bigint values it was 684.26 Kb. So I could not figure out whether compression
> is performed or not.
> I tried the same scenario with int datatype as well. For large int values the
> carbondata file size was 684.24 Kb and for small int values it was 684.26 Kb.
> Below are the queries:
> For BigInt table:
> Create table test(a BigInt, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeBigInt.csv'
> into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallBigInt.csv'
> into table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> For Int table:
> Create table test(a Int, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeInt.csv' into
> table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallInt.csv' into
> table test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)