[I] Incorrect printout of execution plan in explain [incubator-wayang]

via GitHub Fri, 24 Oct 2025 11:51:16 -0700


zkaoudi opened a new issue, #627:
URL: https://github.com/apache/incubator-wayang/issues/627


   I have the following wordcount version:
   
           /* Start building the Wayang Plan */
           planBuilder
                   /* Read the text file */
                   .readTextFile("file:" + 
path.toUri().getPath()).withName("Load file")
   
                   /* Split each line by non-word characters */
                   .flatMap(line -> Arrays.asList(line.split("\\W+")))
                   .withName("Split words").withTargetPlatform(Java.platform())
   
                   /* Filter empty tokens */
                   .filter(token -> !token.isEmpty())
                   .withName("Filter empty words")
   
                   /* Attach counter to each word */
                   .map(word -> new Tuple2<>(word.toLowerCase(), 
1)).withName("To lower case, add counter")
   
                   // Sum up counters for every word.
                   .reduceByKey(
                           Tuple2::getField0,
                           (t1, t2) -> new Tuple2<>(t1.getField0(), 
t1.getField1() + t2.getField1())
                   )
                   .withName("Add 
counters").withTargetPlatform(Spark.platform())
   
                   /* Use the following if you just want to see the execution 
plan */
                   .build().explain(false);
   
   where I force the flatmap to run in Java and the reduceByKey to run in 
Spark. However, I get the following printout of execution plan:
   
   == Execution Plan ==
   -+ JavaTextFileSource[Load file]
     -+ JavaFlatMap[Split words]
       -+ JavaFilter[Filter empty words]
         -+ JavaMap[To lower case, add counter]
           -+ JavaReduceBy[1->1, id=3d6300e8]
             -+ JavaLocalCallbackSink[explain()]
   
   When I use collect() instead of explain to execute the plan I can see that 
the execution is done correctly and the reduceByKey is executed in Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Incorrect printout of execution plan in explain [incubator-wayang]

Reply via email to