[ 
https://issues.apache.org/jira/browse/BEAM-7443?focusedWorklogId=252055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-252055
 ]

ASF GitHub Bot logged work on BEAM-7443:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/May/19 21:04
            Start Date: 31/May/19 21:04
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on pull request #8641: [BEAM-7443] 
Create a BoundedSource -> SDF wrapper in Python SDK
URL: https://github.com/apache/beam/pull/8641#discussion_r289549079
 
 

 ##########
 File path: sdks/python/apache_beam/io/iobase.py
 ##########
 @@ -1180,15 +1180,47 @@ def check_done(self):
     raise NotImplementedError
 
   def try_split(self, fraction_of_remainder):
+    """Splits current restriction based on fraction_of_remainder.
+
+    Invoked when SDK receiving ProcessBundleSplitRequest during processing
 
 Review comment:
   ```
   If splitting the current restriction is possible, the current restriction is 
split into a primary and residual restriction pair. This invocation updates the 
``current_restriction()`` to be the primary restriction effectively having the 
current ``DoFn.process()`` execution responsible for performing the work that 
the primary restriction represents. The residual restriction will be executed 
in a separate ``DoFn.process()`` invocation (likely in a different process). 
The work performed by executing the primary and residual restrictions as 
separate ``DoFn.process()`` invocations MUST be equivalent to the work 
performed as if this split never occurred.
   
   The ``fraction_of_remainder`` should be used in a best effort manner to 
choose a primary and residual restriction based upon the fraction of the 
remaining work that the current ``DoFn.process()`` invocation is responsible 
for. For example, if a ``DoFn.process()`` was reading a file with a restriction 
representing the offset range [100, 200) and has processed up to offset 130 
with a fraction_of_remainder of 0.7, the primary and residual restrictions 
returned would be [100, 179), [179, 200) (note: current_offset + 
fraction_of_remainder * remaining_work = 130 + 0.7 * 70 = 179).
   
   It is very important for pipeline scaling and end to end pipeline execution 
that try_split is implemented well.
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 252055)
    Time Spent: 3h 20m  (was: 3h 10m)

>  BoundedSource->SDF needs a wrapper in Python SDK
> -------------------------------------------------
>
>                 Key: BEAM-7443
>                 URL: https://issues.apache.org/jira/browse/BEAM-7443
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Boyuan Zhang
>            Assignee: Boyuan Zhang
>            Priority: Major
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to