Write ahead Logs and checkpoint

2015-02-23 Thread V Dineshkumar
Hi, My spark streaming application is pulling data from Kafka.To prevent data loss I have implemented WAL and enable checkpointing.On killing my application and restarting it I am able to prevent data loss now but however I am getting duplicate messages. Is it because the application got killed

Re: Write ahead Logs and checkpoint

2015-02-23 Thread Tathagata Das
Exactly, that is the reason. To avoid that, in Spark 1.3 to-be-released, we have added a new Kafka API (called direct stream) which does not use Zookeeper at all to keep track of progress, and maintains offset within Spark Streaming. That can guarantee all records being received exactly-once. Its

Re: Write ahead Logs and checkpoint

2015-02-23 Thread Felix C
: Re: Write ahead Logs and checkpoint Exactly, that is the reason. To avoid that, in Spark 1.3 to-be-released, we have added a new Kafka API (called direct stream) which does not use Zookeeper at all to keep track of progress, and maintains offset within Spark Streaming. That can guarantee all

Re: Write ahead Logs and checkpoint

2015-02-23 Thread Tathagata Das
: user user@spark.apache.org Subject: Re: Write ahead Logs and checkpoint Exactly, that is the reason. To avoid that, in Spark 1.3 to-be-released, we have added a new Kafka API (called direct stream) which does not use Zookeeper at all to keep track of progress, and maintains offset within