[
https://issues.apache.org/jira/browse/BEAM-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547822#comment-17547822
]
Kenneth Knowles commented on BEAM-6819:
---------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/19533
> Remote sources provide insufficient metadata about relative sizes of splits
> ---------------------------------------------------------------------------
>
> Key: BEAM-6819
> URL: https://issues.apache.org/jira/browse/BEAM-6819
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Sunil Pedapudi
> Priority: P3
> Labels: Clarified
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In the current split protocol, SourceMetadata is reported for the initial
> parent source. Subsequent splits drop the SourceMetadata. Without this
> additional information, downstream systems make simplifying assumptions that
> result in decorrelation between input fraction and the actual fraction of
> input represented by a task.
> This decorrelation of input fraction has cascading negative effects for any
> system relying on trends in input fraction (eg., Cloud Dataflow's autotuning).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)