[
https://issues.apache.org/jira/browse/BEAM-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-6819:
----------------------------------
This Jira ticket has a pull request attached to it, but is still open. Did the
pull request resolve the issue? If so, could you please mark it resolved? This
will help the project have a clear view of its open issues.
> Remote sources provide insufficient metadata about relative sizes of splits
> ---------------------------------------------------------------------------
>
> Key: BEAM-6819
> URL: https://issues.apache.org/jira/browse/BEAM-6819
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Sunil Pedapudi
> Priority: P3
> Labels: Clarified
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In the current split protocol, SourceMetadata is reported for the initial
> parent source. Subsequent splits drop the SourceMetadata. Without this
> additional information, downstream systems make simplifying assumptions that
> result in decorrelation between input fraction and the actual fraction of
> input represented by a task.
> This decorrelation of input fraction has cascading negative effects for any
> system relying on trends in input fraction (eg., Cloud Dataflow's autotuning).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)