For this particular case kafka -> cassandra, you need not worry about partial windows. Cassandra output operator does batch processing i.e. all records received in a window will be written at end window. So IMO, if you set exactly once processing on Kafka Input operator, and choose transactional cassandra output operator you will achieve exactly once processing. If you have other operators in your dag you might want to make sure they are idempotent (please check blog shared by Sandesh for reference).
-Priyanka On Wed, Feb 15, 2017 at 4:06 AM, Sandesh Hegde <[email protected]> wrote: > Settings mentioned by Sanjay, will only guarantee exactly once for Windows, > but not for partial window processed by the operator, in a way that setting > is a misnomer. > To achieve Exactly once, there are some precoditions that need to be met > along with the support in the output operator. Here is a blog that gives > the idea about exactly once, > https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/ > > On Tue, Feb 14, 2017 at 2:11 PM Sanjay Pujare <[email protected]> > wrote: > > > Have you taken a look at > > http://apex.apache.org/docs/apex/application_development/#exactly-once ? > > i.e. setting that processing mode on all the operators in the pipeline . > > > > Join us at Apex Big Data World-San Jose < > > http://www.apexbigdata.com/san-jose.html>, April 4, 2017! > > > > http://www.apexbigdata.com/san-jose-register.html > > > > > > On 2/14/17, 12:00 PM, "Himanshu Bari" <[email protected]> wrote: > > > > How to ensure that the Kafka to Cassandra ingestion pipeline in Apex > > will > > guarantee exactly once processing semantics. > > Eg. Message was read from Kafka but apex app died before it was > written > > successfully to Cassandra. > > > > > > > > -- > *Join us at Apex Big Data World-San Jose > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!* > [image: http://www.apexbigdata.com/san-jose-register.html] > <http://www.apexbigdata.com/san-jose-register.html> >
