Hi I second Jenny here. It's not yet supported but definitely a good feature.
Regards JB On Sep 23, 2016, 14:03, at 14:03, Jihong Ma <[email protected]> wrote: >Hi Vincent, > >Are you referring to writing out Spark streaming data to Carbon file? >we don't support it yet, but it is in our near term plan to add the >integration, we will start the discussion in the dev list soon and >would love to hear your input, we will take into account the old >DStream interface as well as Spark 2.0 structured streaming, we would >like to ensure exactly-once semantics and design Carbon as an >idempotent sink. > >At the moment, we have fully integrated with Spark SQL with both SQL >and API interface, with the help multi-level indexes, we have seen >dramatic performance boost compared to other columnar file format on >hadoop eco-system. You are welcome to try it out for your batch >processing workload, the streaming ingest will come out a little later. > > >Regards. > >Jenny > >-----Original Message----- >From: vincent gromakowski [mailto:[email protected]] >Sent: Friday, September 23, 2016 7:33 AM >To: [email protected] >Subject: carbondata and idempotence > >Hi Carbondata community, >I am evaluating various file format right now and found Carbondata to >be >interesting specially with the multiple index used to avoid full scan >but I >am asking if there is any way to achieve idem potence when writing to >Carbondata from Spark (or alternative) ? >A strong requirement is to avoid a Spark worker crash to write >duplicated >entries in Carbon... >Tx > >Vincent
