I think it will not affect. We are ignore the offsets store any where
outside Spark Streaming. It is the fact that progress information was being
stored in two different places (SS and Kafka/ZK) that was causing
inconsistencies and duplicates.

TD

On Mon, Feb 23, 2015 at 11:27 PM, Felix C <felixcheun...@hotmail.com> wrote:

>  Kafka 0.8.2 has built-in offset management, how would that affect direct
> stream in spark?
> Please see KAFKA-1012
>
> --- Original Message ---
>
> From: "Tathagata Das" <t...@databricks.com>
> Sent: February 23, 2015 9:53 PM
> To: "V Dineshkumar" <developer.dines...@gmail.com>
> Cc: "user" <user@spark.apache.org>
> Subject: Re: Write ahead Logs and checkpoint
>
>  Exactly, that is the reason.
>
>  To avoid that, in Spark 1.3 to-be-released, we have added a new Kafka
> API (called direct stream) which does not use Zookeeper at all to keep
> track of progress, and maintains offset within Spark Streaming. That can
> guarantee all records being received exactly-once. Its experimental for
> now, but we will make it stable. Please try it out.
>
>  TD
>
> On Mon, Feb 23, 2015 at 9:41 PM, V Dineshkumar <
> developer.dines...@gmail.com> wrote:
>
> Hi,
>
>  My spark streaming application is pulling data from Kafka.To prevent
> data loss I have implemented WAL and enable checkpointing.On killing my
> application and restarting it I am able to prevent data loss now but
> however I am getting duplicate messages.
>
>  Is it because the application got killed before it was able checkpoint
> the current processing state??
> If yes how to tackle the duplicate messages?
>
>  Thanks,
> Dinesh
>
>
>

Reply via email to