You could use the same pattern in your flatMap function. If you want Spark
to keep retrying though, you don't need any special logic, that is what it
would do already. You could increase the number of task retries though; see
the spark.excludeOnFailure.task.* configurations.

You can just implement the circuit breaker pattern directly too, nothing
special there, though I don't think that's what you want? you actually want
to retry the failed attempts, not just avoid calling the microservice.

On Wed, Feb 16, 2022 at 3:18 AM S <sheelst...@gmail.com> wrote:

> Hi,
>
> We have a spark job that calls a microservice in the lambda function of
> the flatmap transformation  -> passes to this microservice, the inbound
> element in the lambda function and returns the transformed value or "None"
> from the microservice as an output of this flatMap transform. Of course the
> lambda also takes care of exceptions from the microservice etc.. The
> question is: there are times when the microservice may be down and there is
> no point recording an exception and putting the message in the DLQ for
> every element in our streaming pipeline so long as the microservice stays
> down. Instead we want to be able to do is retry the microservice call for a
> given event for a predefined no. of times and if found to be down then
> terminate the spark job so that this current microbatch is terminated and
> there is no next microbatch and the rest of the messages continue therefore
> continue to be in the source kafka topics unpolled and therefore
> unprocesseed.  until the microservice is back up and the spark job is
> redeployed again. In regular microservices, we can implement this using the
> Circuit breaker pattern. In Spark jobs however this would mean, being able
> to somehow send a signal from an executor JVM to the driver JVM to
> terminate the Spark job. Is there a way to do that in Spark?
>
> P.S.:
> - Having the circuit breaker functionality helps specificize the purpose
> of the DLQ to data or schema issues only instead of infra/network related
> issues.
> - As far as the need for the Spark job to use microservices is concerned,
> think of it as a complex logic being maintained in a microservice that does
> not warrant duplication.
> - checkpointing is being taken care of manually and not using spark's
> default checkpointing mechanism.
>
> Regards,
> Sheel
>

Reply via email to