RE: carbondata and idempotence

Jihong Ma Fri, 23 Sep 2016 14:03:07 -0700

Hi Vincent,

Are you referring to writing out Spark streaming data to Carbon file?  we don't 
support it yet, but it is in our near term plan to add the integration, we will 
start the discussion in the dev list soon and would love to hear your input, we 
will take into account the old DStream interface as well as Spark 2.0  
structured streaming, we would like to ensure exactly-once semantics and design 
Carbon as an idempotent sink.


At the moment, we have fully integrated with Spark SQL with both SQL and API 
interface, with the help multi-level indexes, we have seen dramatic performance 
boost compared to other columnar file format on hadoop eco-system. You are 
welcome to try it out for your batch processing workload, the streaming ingest 
will come out a little later. 

Regards.

Jenny   

-----Original Message-----
From: vincent gromakowski [mailto:[email protected]] 
Sent: Friday, September 23, 2016 7:33 AM
To: [email protected]
Subject: carbondata and idempotence

Hi Carbondata community,
I am evaluating various file format right now and found Carbondata to be
interesting specially with the multiple index used to avoid full scan but I
am asking if there is any way to achieve idem potence when writing to
Carbondata from Spark (or alternative) ?
A strong requirement is to avoid a Spark worker crash to write duplicated
entries in Carbon...
Tx

Vincent

RE: carbondata and idempotence

Reply via email to