[
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694104#comment-16694104
]
xubo245 commented on CARBONDATA-2951:
-------------------------------------
https://r2---sn-npoeenek.googlevideo.com/videoplayback?lmt=1521057331061550&dur=1812.178&source=youtube&pl=25&requiressl=yes&ip=142.93.137.161&ei=VLv0W5_UNcqagAeN3q7oBA&signature=254556679094EFD17DAC3DAD278E66478407BC49.4E38F21849411EDBDCDBF5990ED176A375B2F06A&key=cms1&id=o-AC5nDRva0FaTCRRdBU5bhUeOEws4bx8zmbynLQo0P895&itag=22&mime=video%2Fmp4&expire=1542786997&sparams=dur,ei,expire,id,ip,ipbits,itag,lmt,mime,mip,mm,mn,ms,mv,pl,ratebypass,requiressl,source&fvip=2&ratebypass=yes&c=WEB&ipbits=0&video_id=lhsAg2H_GXc&title=Apache+Carbondata-+An+Indexed+Columnar+File+Format+for+Interactive+Query+by+Jacky+Li-Jihong+Ma&redirect_counter=1&cm2rm=sn-5hnel77l&fexp=23763603&req_id=7ca3e5fd8923a3ee&cms_redirect=yes&mip=116.66.184.191&mm=34&mn=sn-npoeenek&ms=ltu&mt=1542765298&mv=m
> CSDK: Provide C++ interface for SDK
> -----------------------------------
>
> Key: CARBONDATA-2951
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
> Project: CarbonData
> Issue Type: Task
> Components: other
> Affects Versions: 1.5.0
> Reporter: xubo245
> Assignee: xubo245
> Priority: Critical
> Fix For: NONE
>
>
> For the some user of using C++ code in their project, they can't call
> CarbonData interface and integrate CarbonData into their C++ project. So we
> plan to provide C++ interface for C++ user to integrate carbon, including
> read and write CarbonData. It's will more convenient for they.
> We plan to design and develop as following:
> 1. Provide CarbonReader for SDK, it can read carbon data in C++ language
> ##features/interfaces
> 1.1. create CarbonReader
> 1.2. hasNext()
> 1.3. readNextRow()
> 1.4. close()
> 1.5. support OBS(AK/SK/Endpoint)
> 1.6 support batch read(withBatch,readNextBatchRow)
> 1.7 support vecor read(default) and carbonrecordreader
> (withRowRecordReader)
> 1.8 projection
>
> ##support data types:
> String,
> Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
> Array<String> in carbonrecordreader, not support in vectorreader
> byte=>support in java RowUtil, not in C++ carbon reader
>
> ## Schema and data
> Create table tbl_email_form_to_for_XX(
> Event_Time Timestamp,
> Ingestion_Time Timestamp,
> From_Email String,
> To_Email String,
> From_To_type String,
> Event_ID String
> ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
> ETL 6 columns from 18 columns table
>
> example data:
> [email protected]
> [email protected] from_to
> <29528303.1075855666657.JavaMail.evans@thyme> 1538015497000000
> 9755149200000
> 2. the performance should be reach X millions records/s/node
> 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
> ##features/interfaces
> 3.1. create CarbonWriter, including create schema(withCsvInput),set
> outputPath, and build,
> 3.2. write()
> 3.3. close()
> 3.4. support OBS(AK/SK/Endpoint)(withHadoopConf)
> 3.5. writtenBy
> 3.6. support withTableProperty, withLoadOption,taskNo,
> uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize,
> localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE
> review)
>
> ##Data types:
> Carbon need support base data types, including string, float,
> double, int, long, date, timestamp, bool, array<String>.
> For other, we can convert:
> char array => carbon string
> Enum => Carbon string
> set and list => carbon array<String>
> ##performance
> Writing Performance is not required now
>
> 4. read schema function
> readSchema
> getVersionDetails =>TODO
> 5. support carbonproperties
> 5.1 addProperty
> 5.2 getProperty
>
> 6.TODO:
> 6.1.getVersionDetails. => to be review
> 6.2.updated SDK/CSDK reader doc => to be review
> 6.3.support byte(write read)
> 6.4.support long string columns
> 6.5.support sortBy=> to be review
> 6.6.support withCsvInput(Schema schema); create schema(JAVA)
> 6.7. optimize the write doc => to be review
> /**
> * Create a {@link CarbonWriterBuilder} to build a
> {@link CarbonWriter}
> */
> public static CarbonWriterBuilder builder() {
> return new CarbonWriterBuilder();
> }
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)