RE: carbondata and idempotence

Jean-Baptiste Onofré Fri, 23 Sep 2016 14:10:19 -0700

Hi

I second Jenny here. It's not yet supported but definitely a good feature.


Regards
JB



On Sep 23, 2016, 14:03, at 14:03, Jihong Ma <[email protected]> wrote:
>Hi Vincent,
>
>Are you referring to writing out Spark streaming data to Carbon file? 
>we don't support it yet, but it is in our near term plan to add the
>integration, we will start the discussion in the dev list soon and
>would love to hear your input, we will take into account the old
>DStream interface as well as Spark 2.0  structured streaming, we would
>like to ensure exactly-once semantics and design Carbon as an
>idempotent sink. 
>
>At the moment, we have fully integrated with Spark SQL with both SQL
>and API interface, with the help multi-level indexes, we have seen
>dramatic performance boost compared to other columnar file format on
>hadoop eco-system. You are welcome to try it out for your batch
>processing workload, the streaming ingest will come out a little later.
>
>
>Regards.
>
>Jenny   
>
>-----Original Message-----
>From: vincent gromakowski [mailto:[email protected]] 
>Sent: Friday, September 23, 2016 7:33 AM
>To: [email protected]
>Subject: carbondata and idempotence
>
>Hi Carbondata community,
>I am evaluating various file format right now and found Carbondata to
>be
>interesting specially with the multiple index used to avoid full scan
>but I
>am asking if there is any way to achieve idem potence when writing to
>Carbondata from Spark (or alternative) ?
>A strong requirement is to avoid a Spark worker crash to write
>duplicated
>entries in Carbon...
>Tx
>
>Vincent

RE: carbondata and idempotence

Reply via email to