Le 16 févr. 2018 19:28, "Kenneth Knowles" <k...@google.com> a écrit :
On Fri, Feb 16, 2018 at 9:39 AM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > > 2018-02-16 18:18 GMT+01:00 Kenneth Knowles <k...@google.com>: > >> Which runner's bundling are you concerned with? It sounds like the Flink >> runner? >> > > Flink, Spark, DirectRunner, DataFlow at least (others would be good but > are out of scope) > AFAIK bundling logic/perf is satisfactory on Dataflow, DirectRunner (for testing, so generates medium-sized local bundles) and SparkRunner (one bundle per microbatch when streaming). So what issue did you notice there? No place to clear execution cache and free pipeline specific data and resources. This cant be done by bundles cause it can impact perfs or more viciously kind of connection frequency limit of the backend. Beam cant help here and should embrace these user constraint IMHO and there MUST - uppercase as in specs - call teardown per execution. The serialization of fn being once per bundle, the perf impact is only huge if there is a bug somewhere else, even java serialization is negligeable on big config compared to any small pipeline (seconds vs minutes). So no real perf issue - happy to check a real case if you can share one, a security severe issue leads and a user issue lead to a fix which should be in the 2.4 no? IIRC at some point the FlinkRunner had 1 element bundles in streaming. Obviously if that is still the case it has to be fixed. Kenn