Le 16 févr. 2018 19:28, "Kenneth Knowles" <k...@google.com> a écrit :

On Fri, Feb 16, 2018 at 9:39 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:
>
> 2018-02-16 18:18 GMT+01:00 Kenneth Knowles <k...@google.com>:
>
>> Which runner's bundling are you concerned with? It sounds like the Flink
>> runner?
>>
>
> Flink, Spark, DirectRunner, DataFlow at least (others would be good but
> are out of scope)
>

AFAIK bundling logic/perf is satisfactory on Dataflow, DirectRunner (for
testing, so generates medium-sized local bundles) and SparkRunner (one
bundle per microbatch when streaming). So what issue did you notice there?


No place to clear execution cache and free pipeline specific data and
resources.

This cant be done by bundles cause it can impact perfs or more viciously
kind of connection frequency limit of the backend.

Beam cant help here and should embrace these user constraint IMHO and there
MUST - uppercase as in specs - call teardown per execution.

The serialization of fn being once per bundle, the perf impact is only huge
if there is a bug somewhere else, even java serialization is negligeable on
big config compared to any small pipeline (seconds vs minutes).

So no real perf issue - happy to check a real case if you can share one,  a
security severe issue leads and a user issue lead to a fix which should be
in the 2.4 no?


IIRC at some point the FlinkRunner had 1 element bundles in streaming.
Obviously if that is still the case it has to be fixed.

Kenn

Reply via email to