[
https://issues.apache.org/jira/browse/BEAM-7443?focusedWorklogId=252055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-252055
]
ASF GitHub Bot logged work on BEAM-7443:
----------------------------------------
Author: ASF GitHub Bot
Created on: 31/May/19 21:04
Start Date: 31/May/19 21:04
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #8641: [BEAM-7443]
Create a BoundedSource -> SDF wrapper in Python SDK
URL: https://github.com/apache/beam/pull/8641#discussion_r289549079
##########
File path: sdks/python/apache_beam/io/iobase.py
##########
@@ -1180,15 +1180,47 @@ def check_done(self):
raise NotImplementedError
def try_split(self, fraction_of_remainder):
+ """Splits current restriction based on fraction_of_remainder.
+
+ Invoked when SDK receiving ProcessBundleSplitRequest during processing
Review comment:
```
If splitting the current restriction is possible, the current restriction is
split into a primary and residual restriction pair. This invocation updates the
``current_restriction()`` to be the primary restriction effectively having the
current ``DoFn.process()`` execution responsible for performing the work that
the primary restriction represents. The residual restriction will be executed
in a separate ``DoFn.process()`` invocation (likely in a different process).
The work performed by executing the primary and residual restrictions as
separate ``DoFn.process()`` invocations MUST be equivalent to the work
performed as if this split never occurred.
The ``fraction_of_remainder`` should be used in a best effort manner to
choose a primary and residual restriction based upon the fraction of the
remaining work that the current ``DoFn.process()`` invocation is responsible
for. For example, if a ``DoFn.process()`` was reading a file with a restriction
representing the offset range [100, 200) and has processed up to offset 130
with a fraction_of_remainder of 0.7, the primary and residual restrictions
returned would be [100, 179), [179, 200) (note: current_offset +
fraction_of_remainder * remaining_work = 130 + 0.7 * 70 = 179).
It is very important for pipeline scaling and end to end pipeline execution
that try_split is implemented well.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 252055)
Time Spent: 3h 20m (was: 3h 10m)
> BoundedSource->SDF needs a wrapper in Python SDK
> -------------------------------------------------
>
> Key: BEAM-7443
> URL: https://issues.apache.org/jira/browse/BEAM-7443
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Boyuan Zhang
> Assignee: Boyuan Zhang
> Priority: Major
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)