[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

xubo245 (JIRA) Tue, 16 Apr 2019 18:20:37 -0700


     [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xubo245 updated CARBONDATA-3351:
--------------------------------
    Description: 
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column.     
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3

        3. Supporting read binary data type by Carbon SDK
            3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
            3.2 Supporting projection for binary column
            3.3 Supporting S3
            3.4 no need to support filter.

        4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
            4.1 Convert binary to String and storage in CSV
            4.2 Spark load CSV and convert string to byte[], and storage in 
CarbonData. read binary column and return as byte[]
            4.3 Supporting insert into (string => binary),  TODO: update, 
delete for binary
            4.4 Don’t support stream table.
        => refer hive and Spark2.4 image DataSource


  was:
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column.     
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3


> Support Binary Data Type
> ------------------------
>
>                 Key: CARBONDATA-3351
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: xubo245
>            Assignee: xubo245
>            Priority: Major
>          Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> 1.Supporting write binary data type by Carbon Java SDK:
> 1.1 Java SDK needs support write data with specific data types, like int, 
> double, byte[ ] data type, no need to convert all data type to string array. 
> User read binary file as byte[], then SDK writes byte[] into binary column.   
>  
> 1.2 CarbonData compress binary column because now the compressor is table 
> level.
> =>TODO, support configuration for compress, default is no compress because 
> binary usually is already compressed, like jpg format image. So no need to 
> uncompress for binary column. 1.5.4 will support column level compression, 
> after that, we can implement no compress for binary. We can talk with 
> community.
> 1.3 CarbonData stores binary as dimension.
> 1.4 Support configure page size for binary data type because binary data 
> usually is big, such as 200k. Otherwise it will be very big for one blocklet 
> (32000 rows). =>PR2814
> 2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.
> 2.1 Supporting read binary data type from non-transaction table, read binary 
> column and return as byte[]
> 2.2 Support create table with binary column, table property doesn’t support 
> sort_columns, dictionary, RANGE_COLUMN for binary column
> => Evaluate COLUMN_META_CACHE for binary
> => CARBON Datasource don't support dictionary include column
> => carbon.column.compressor for all columns
> 2.3 Support CTAS for binary=> transaction/non-transaction
> 2.4 Support external table for binary
> 2.5 Support projection for binary column
> 2.6 Support desc
> => Carbon Datasource don't support ALTER TABLE add column by sql
> 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 
> 2.8 Support S3
>       3. Supporting read binary data type by Carbon SDK
>           3.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>           3.2 Supporting projection for binary column
>           3.3 Supporting S3
>           3.4 no need to support filter.
>       4. Supporting write binary by spark (carbon file format / 
> carbonsession, POC??)
>           4.1 Convert binary to String and storage in CSV
>           4.2 Spark load CSV and convert string to byte[], and storage in 
> CarbonData. read binary column and return as byte[]
>           4.3 Supporting insert into (string => binary),  TODO: update, 
> delete for binary
>           4.4 Don’t support stream table.
>       => refer hive and Spark2.4 image DataSource



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

Reply via email to