Sunil Pedapudi created BEAM-6819:
------------------------------------
Summary: Remote sources provide insufficient metadata about
relative sizes of splits
Key: BEAM-6819
URL: https://issues.apache.org/jira/browse/BEAM-6819
Project: Beam
Issue Type: Improvement
Components: runner-dataflow, sdk-java-core
Reporter: Sunil Pedapudi
In the current split protocol, SourceMetadata is reported for the initial
parent source. Subsequent splits drop the SourceMetadata. Without this
additional information, downstream systems make simplifying assumptions that
result in decorrelation between input fraction and the actual fraction of input
represented by a task.
This decorrelation of input fraction has cascading negative effects for any
system relying on trends in input fraction (eg., Cloud Dataflow's autotuning).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)