[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

xubo245 (JIRA) Thu, 11 Apr 2019 23:38:19 -0700


     [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xubo245 updated CARBONDATA-3351:
--------------------------------
    Description: 

        1.Supporting write binary data type by Carbon Java SDK [Formal]:
            1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
            1.2 CarbonData compress binary column because now the compressor is 
table level.
                =>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
            1.3 CarbonData stores binary as dimension.
            1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
          TODO: 1.5 Avro, JSON convert need consider            
        

        2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
            2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
            2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
        => Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
            2.3 Support CTAS for binary=> transaction/non-transaction
            2.4 Support external table for binary
            2.5 Support projection for binary column
            2.6 Support show table, desc, ALTER TABLE for binary data type
            2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary       
            2.8 Support compaction for binary
            2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
            2.10 CSDK / python SDK support binary in the future.
            2.11 Support S3

> Support Binary Data Type
> ------------------------
>
>                 Key: CARBONDATA-3351
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: xubo245
>            Assignee: xubo245
>            Priority: Major
>
>       1.Supporting write binary data type by Carbon Java SDK [Formal]:
>           1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.  
>           1.2 CarbonData compress binary column because now the compressor is 
> table level.
>               =>TODO, support configuration for compress, default is no 
> compress because binary usually is already compressed, like jpg format image. 
> So no need to uncompress for binary column. 1.5.4 will support column level 
> compression, after that, we can implement no compress for binary. We can talk 
> with community.
>           1.3 CarbonData stores binary as dimension.
>           1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows).
>         TODO: 1.5 Avro, JSON convert need consider            
>       
>       2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>           2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>           2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column
>       => Evaluate COLUMN_META_CACHE for binary
> => carbon.column.compressor for all columns
>           2.3 Support CTAS for binary=> transaction/non-transaction
>           2.4 Support external table for binary
>           2.5 Support projection for binary column
>           2.6 Support show table, desc, ALTER TABLE for binary data type
>           2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary       
>           2.8 Support compaction for binary
>           2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-aggregate in the 
> future
>           2.10 CSDK / python SDK support binary in the future.
>           2.11 Support S3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

Reply via email to