damccorm commented on code in PR #23985:
URL: https://github.com/apache/beam/pull/23985#discussion_r1015929569


##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py:
##########
@@ -1150,7 +1181,8 @@ class BundleManager(object):
   def __init__(self,
                bundle_context_manager,  # type: execution.BundleContextManager
                progress_frequency=None,  # type: Optional[float]
-               cache_token_generator=FnApiRunner.get_cache_token_generator()
+               cache_token_generator=FnApiRunner.get_cache_token_generator(),
+               split_managers=(),

Review Comment:
   ```suggestion
                  split_managers=()
   ```
   
   nit



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -2110,6 +2111,30 @@ def process(
         if expected_groups:
           assert_that(grouped, equal_to(expected_groups), label='CheckGrouped')
 
+  def test_time_based_split_manager(self):
+
+    elements = list(range(100))
+
+    class BundleCountingDoFn(beam.DoFn):
+      def process(self, element) -> Iterator[object]:
+        time.sleep(0.001)
+        yield element
+
+      def finish_bundle(self):
+        yield window.GlobalWindows.windowed_value('endOfBundle')
+
+    with self.create_pipeline() as p:
+      p._options.view_as(DirectOptions).test_splits = {
+          'SplitMarker': {
+              'timings': [0, .05], 'fractions': [0.5, 0.5]
+          }
+      }
+      assert_that(
+          p
+          | beam.Create(elements)
+          | 'SplitMarker' >> beam.ParDo(BundleCountingDoFn()),
+          equal_to(elements + ['endOfBundle'] * 2))

Review Comment:
   Since we have 2 splits, shouldn't we be anticipating 3 endOfBundle calls?



##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -2110,6 +2111,30 @@ def process(
         if expected_groups:
           assert_that(grouped, equal_to(expected_groups), label='CheckGrouped')
 
+  def test_time_based_split_manager(self):
+
+    elements = list(range(100))
+
+    class BundleCountingDoFn(beam.DoFn):
+      def process(self, element) -> Iterator[object]:
+        time.sleep(0.001)
+        yield element
+
+      def finish_bundle(self):
+        yield window.GlobalWindows.windowed_value('endOfBundle')
+
+    with self.create_pipeline() as p:
+      p._options.view_as(DirectOptions).test_splits = {
+          'SplitMarker': {
+              'timings': [0, .05], 'fractions': [0.5, 0.5]
+          }
+      }
+      assert_that(
+          p
+          | beam.Create(elements)
+          | 'SplitMarker' >> beam.ParDo(BundleCountingDoFn()),
+          equal_to(elements + ['endOfBundle'] * 2))

Review Comment:
   I think you answer this below:
   ```
               // More than one "endOfBundle" output because we split.
               // (We in fact should have split multiple times, but all the
               // remainders get concatenated into the second bundle.
               // Importantly, however, we verify that regardless of where the
               // splits occurred (if you print them, they're in an interesting
               // order), all elements are accounted for.)
   ```
   
   Though I'm not sure why they get concatenated. Regardless, dropping a 
similar comment here would be helpful



##########
sdks/typescript/src/apache_beam/worker/operators.ts:
##########
@@ -234,6 +257,46 @@ class DataSourceOperator implements IOperator {
     throw Error("Data should not come in via process.");
   }
 
+  split(
+    desiredSplit: fnApi.ProcessBundleSplitRequest_DesiredSplit
+  ): fnApi.ProcessBundleSplitResponse_ChannelSplit | undefined {
+    if (!this.started) {
+      return undefined;
+    }
+    const end =
+      this.lastToProcessElement < Infinity
+        ? this.lastToProcessElement
+        : Number(desiredSplit.estimatedInputElements) - 1;
+    console.log(this.lastToProcessElement, this.lastProcessedElement, end);
+    if (this.lastProcessedElement >= end) {
+      return undefined;
+    }
+    var targetLastToProcessElement = Math.floor(
+      this.lastProcessedElement +
+        (end - this.lastProcessedElement) * desiredSplit.fractionOfRemainder
+    );
+    if (desiredSplit.allowedSplitPoints.length) {
+      targetLastToProcessElement =
+        Math.min(
+          ...Array.from(desiredSplit.allowedSplitPoints)
+            .filter((index) => index >= targetLastToProcessElement + 1)

Review Comment:
   ```suggestion
               .filter((allowedSplitPoint) => allowedSplitPoint > 
targetLastToProcessElement)
   ```
   
   Optional: I think this can be simplified + naming threw me off here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to