[ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:
--------------------------------
    Description: 
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
        ##features/interfaces
        1.1.    create CarbonReader
        1.2.    hasNext()
        1.3.    readNextRow()
        1.4.    close()
        1.5.    support OBS(AK/SK/Endpoint)
        1.6     support batch read(withBatch,readNextBatchRow) 
        1.7     support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
        1.8     projection
        
        ##support data types:
         String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
         Array<String> in carbonrecordreader, not support in vectorreader
         byte=>support in java RowUtil, not in C++ carbon reader
         
        ## Schema and data
         Create table tbl_email_form_to_for_XX( 
                Event_Time Timestamp,
                Ingestion_Time Timestamp,
                From_Email String,
                To_Email String,
                From_To_type String,
                Event_ID String
                ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
                ETL 6 columns from 18 columns table
                
                example data:
                from_email_36550_phillip.al...@enron.com        
to_email_36550_stagecoachm...@hotmail.com       from_to 
<29528303.1075855666657.JavaMail.evans@thyme>   1538015497000000        
9755149200000

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
        ##features/interfaces
        3.1.    create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
        3.2.    write()
        3.3.    close()
        3.4.    support OBS(AK/SK/Endpoint)(withHadoopConf)
        3.5.    writtenBy
        3.6.     support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
        
        ##Data types:
           Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array<String>.
          For other, we can convert:
             char array => carbon string
             Enum => Carbon string
              set and list => carbon array<String>

        ##performance
        Writing Performance is not required now
        
4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
        5.1 addProperty
        5.2 getProperty
        
6.TODO:
        6.1.getVersionDetails
        6.2.updated SDK/CSDK reader doc
        6.3.support byte(write read)
        6.4.support long string columns
        6.5.support sortBy
        6.6.support withCsvInput(Schema schema);  create schema(JAVA)
        6.7. optimize the write doc
                        /**
                        * Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
                        */
                        public static CarbonWriterBuilder builder() {
                                return new CarbonWriterBuilder();
                        }

  was:
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
        ##features/interfaces
#        1.1.   create CarbonReader
#       1.2.    hasNext()
#       1.3.    readNextRow()
#       1.4.    close()
#       1.5.    support OBS(AK/SK/Endpoint)
#       1.6     support batch read(withBatch,readNextBatchRow) 
#       1.7     support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
#       1.8     projection
        
        ##support data types:
         String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
         Array<String> in carbonrecordreader, not support in vectorreader
         byte=>support in java RowUtil, not in C++ carbon reader
         
        ## Schema and data
         Create table tbl_email_form_to_for_XX( 
                Event_Time Timestamp,
                Ingestion_Time Timestamp,
                From_Email String,
                To_Email String,
                From_To_type String,
                Event_ID String
                ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
                ETL 6 columns from 18 columns table
                
                example data:
                from_email_36550_phillip.al...@enron.com        
to_email_36550_stagecoachm...@hotmail.com       from_to 
<29528303.1075855666657.JavaMail.evans@thyme>   1538015497000000        
9755149200000

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
        ##features/interfaces
        3.1.    create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
        3.2.    write()
        3.3.    close()
        3.4.    support OBS(AK/SK/Endpoint)(withHadoopConf)
        3.5.    writtenBy
        3.6.     support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
        
        ##Data types:
           Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array<String>.
          For other, we can convert:
             char array => carbon string
             Enum => Carbon string
              set and list => carbon array<String>

        ##performance
        Writing Performance is not required now
        
4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
        5.1 addProperty
        5.2 getProperty
        
6.TODO:
        6.1.getVersionDetails
        6.2.updated SDK/CSDK reader doc
        6.3.support byte(write read)
        6.4.support long string columns
        6.5.support sortBy
        6.6.support withCsvInput(Schema schema);  create schema(JAVA)
        6.7. optimize the write doc
                        /**
                        * Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
                        */
                        public static CarbonWriterBuilder builder() {
                                return new CarbonWriterBuilder();
                        }


> CSDK: Provide C++ interface for SDK
> -----------------------------------
>
>                 Key: CARBONDATA-2951
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
>             Project: CarbonData
>          Issue Type: Task
>          Components: other
>    Affects Versions: 1.5.0
>            Reporter: xubo245
>            Assignee: xubo245
>            Priority: Critical
>             Fix For: NONE
>
>
> For the some user of using C++ code in their project, they can't call 
> CarbonData interface and integrate CarbonData into their C++ project. So we 
> plan to provide C++  interface for C++ user to integrate carbon, including 
> read and write CarbonData. It's will more convenient for they.
> We plan to design and develop  as following:
> 1. Provide CarbonReader for SDK, it can read carbon data in C++ language
>       ##features/interfaces
>         1.1.  create CarbonReader
>       1.2.    hasNext()
>       1.3.    readNextRow()
>       1.4.    close()
>       1.5.    support OBS(AK/SK/Endpoint)
>       1.6     support batch read(withBatch,readNextBatchRow) 
>       1.7     support vecor read(default) and carbonrecordreader 
> (withRowRecordReader)
>       1.8     projection
>       
>       ##support data types:
>        String, 
> Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
>        Array<String> in carbonrecordreader, not support in vectorreader
>        byte=>support in java RowUtil, not in C++ carbon reader
>        
>       ## Schema and data
>        Create table tbl_email_form_to_for_XX( 
>               Event_Time Timestamp,
>               Ingestion_Time Timestamp,
>               From_Email String,
>               To_Email String,
>               From_To_type String,
>               Event_ID String
>               ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
>               ETL 6 columns from 18 columns table
>               
>               example data:
>               from_email_36550_phillip.al...@enron.com        
> to_email_36550_stagecoachm...@hotmail.com       from_to 
> <29528303.1075855666657.JavaMail.evans@thyme>   1538015497000000        
> 9755149200000
> 2. the performance should be reach X millions records/s/node
> 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
>       ##features/interfaces
>       3.1.    create CarbonWriter, including create schema(withCsvInput),set 
> outputPath, and build,
>       3.2.    write()
>       3.3.    close()
>       3.4.    support OBS(AK/SK/Endpoint)(withHadoopConf)
>       3.5.    writtenBy
>       3.6.     support withTableProperty, withLoadOption,taskNo, 
> uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
> localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE 
> review)
>       
>       ##Data types:
>          Carbon need support base data types, including string, float, 
> double, int, long, date, timestamp, bool, array<String>.
>           For other, we can convert:
>              char array => carbon string
>              Enum => Carbon string
>               set and list => carbon array<String>
>       ##performance
>       Writing Performance is not required now
>       
> 4. read schema function
> readSchema
> getVersionDetails  =>TODO
> 5. support carbonproperties
>       5.1 addProperty
>       5.2 getProperty
>       
> 6.TODO:
>       6.1.getVersionDetails
>       6.2.updated SDK/CSDK reader doc
>       6.3.support byte(write read)
>       6.4.support long string columns
>       6.5.support sortBy
>       6.6.support withCsvInput(Schema schema);  create schema(JAVA)
>       6.7. optimize the write doc
>                       /**
>                       * Create a {@link CarbonWriterBuilder} to build a 
> {@link CarbonWriter}
>                       */
>                       public static CarbonWriterBuilder builder() {
>                               return new CarbonWriterBuilder();
>                       }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to