In general, see the material linked from
https://github.com/koeninger/kafka-exactly-once  if you want a better
understanding of the direct stream.

For spark-streaming-kafka-0-8, the direct stream doesn't really care
about consumer group, since it uses the simple consumer.  For the 0.10
version, it uses the new kafka consumer, so consumer group does
matter.  In either case, splitting events across old and new versions
of the job is not what I would want.

I'd suggest making sure that your outputs are idempotent or
transactional, and that the new app has a different consumer group
(for versions for which it matters). Start up the new app, make sure
it is running (even if it errors due to transactional safeguards),
then shut down the old app.


On Tue, Sep 6, 2016 at 3:51 PM, Mariano Semelman
<mariano.semel...@despegar.com> wrote:
> Hello everybody,
>
> I am trying to understand how Kafka Direct Stream works. I'm interested in
> having a production ready Spark Streaming application that consumes a Kafka
> topic. But I need to guarantee there's (almost) no downtime, specially
> during deploys (and submit) of new versions. What it seems to be the best
> solution is to deploy and submit the new version without shutting down the
> previous one, wait for the new application to start consuming events and
> then shutdown the previous one.
>
> What I would expect is that the events get distributed among the two
> applications in a balanced fashion using the consumer group id
> splitted by the partition key that I've previously set on my Kafka Producer.
> However I don't see that Kafka Direct stream support this functionality.
>
> I've achieved this with the Receiver-based approach (btw i've used "kafka"
> for the "offsets.storage" kafka property[2]). However this approach come
> with technical difficulties named in the Documentation[1] (ie: exactly-once
> semantics).
>
> Anyway, not even this approach seems very failsafe, Does anyone know a way
> to safely deploy new versions of a streaming application of this kind
> without downtime?
>
> Thanks in advance
>
> Mariano
>
>
>
> [1] http://spark.apache.org/docs/latest/streaming-kafka-integration.html
> [2] http://kafka.apache.org/documentation.html#oldconsumerconfigs
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to