Please post a minimal complete code example of what you are talking about

On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen
<> wrote:
> I have the following sequence of Spark Java API calls (Spark 2.0.2):
> Kafka stream that is processed via a map function, which returns the string
> value from tuple2._2() for JavaDStream as in
> return tuple2._2();
> The returned JavaDStream is then processed by foreachPartition, which is
> wrapped by foreachRDD.
> foreachPartition's call function does Iterator on the RDD as in
> ();
> When data is received, step 1 is executed, which is correct. However,
> () in step 3 makes a duplicate call to the map function in
> step 1. So that map function is called twice for every message:
> -  the first time when the message is received from the Kafka stream, and
> - the second time when Iterator () is invoked from
> foreachPartition's call function.
> I also tried transforming the data in the map function as in
> public TestTransformedClass call(Tuple2<String, String>  tuple2) for step 1
> public void call(Iterator<TestTransformedClass>  inputParams) for step 3
> and the same issue occurs. So this issue occurs, no matter whether this
> sequence of Spark API calls involves data transformation or not.
> Questions:
> Since the message was already processed in step 1, why does ()
> in step 3 makes a duplicate call to the map function in step 1 ?
> How do I fix it to avoid duplicate invocation for every message ?
> Thanks.

To unsubscribe e-mail:

Reply via email to