Hi Do, the easiest way is to avoid using methods which trigger an eager execution (collect, count, print) but to define sinks instead. Alternatively, you can persist intermediate results by writing them to disk and continue processing from there. That way, you won't re-calculate all parts of your dataflow.
Cheers, Till On Thu, Mar 31, 2016 at 3:09 PM, Le Quoc Do <lequo...@gmail.com> wrote: > Hi all, > > Right now, in Flink, if I call to 2 action operators (print, count, > collect, ) consecutively, Flink will create 2 independent execution plans. > A simple example: > > DataSet<String> text = env.fromElements( > > "Some text ….", > > ); > > DataSet<Tuple2<String, Integer>> result = > > text.flatMap(new Tokenizer()) > > .groupBy(0) > > .sum(1); > > result.count(); > > result.print(); > > Could you please show me the way to force Flink create only one execution > plan, so that the later operator just reuse the result of previous one, and > doesn’t need to re-compute again? > > Thanks, > Do >