[jira] [Work logged] (BEAM-10603) Large Source Recording for Interarctive Runner

ASF GitHub Bot (Jira) Wed, 02 Sep 2020 13:42:18 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-10603?focusedWorklogId=478118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478118
 ]


ASF GitHub Bot logged work on BEAM-10603:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Sep/20 20:41
            Start Date: 02/Sep/20 20:41
    Worklog Time Spent: 10m 
      Work Description: davidyan74 commented on a change in pull request #12703:
URL: https://github.com/apache/beam/pull/12703#discussion_r482420061



##########
File path: sdks/python/apache_beam/runners/interactive/recording_manager_test.py
##########
@@ -288,18 +347,122 @@ def test_basic_wordcount(self):
     # Create the recording objects. By calling `record` a new PipelineFragment
     # is started to compute the given PCollections and cache to disk.
     rm = RecordingManager(p)
-    recording = rm.record([elems], max_n=3, max_duration_secs=500)
-    stream = recording.stream(elems)
-    recording.wait_until_finish()
+    numbers_recording = rm.record([numbers], max_n=3, max_duration_secs=500)
+    numbers_stream = numbers_recording.stream(numbers)
+    numbers_recording.wait_until_finish()
 
     # Once the pipeline fragment completes, we can read from the stream and 
know
     # that all elements were written to cache.
-    elems = list(stream.read())
+    elems = list(numbers_stream.read())
     expected_elems = [
         WindowedValue(i, MIN_TIMESTAMP, [GlobalWindow()]) for i in range(3)
     ]
     self.assertListEqual(elems, expected_elems)
 
+    # Make an extra recording and test the description.
+    letters_recording = rm.record([letters], max_n=3, max_duration_secs=500)
+    letters_recording.wait_until_finish()
+
+    self.assertEqual(
+        rm.describe()['size'],
+        numbers_recording.describe()['size'] +
+        letters_recording.describe()['size'])
+
+    rm.cancel()
+
+  @unittest.skipIf(
+      sys.version_info < (3, 6, 0),
+      'This test requires at least Python 3.6 to work.')
+  def test_cancel_stops_recording(self):
+    # Add the TestStream so that it can be cached.
+    ib.options.capturable_sources.add(TestStream)
+
+    p = beam.Pipeline(
+        InteractiveRunner(), options=PipelineOptions(streaming=True))
+    elems = (
+        p
+        | TestStream().advance_watermark_to(0).advance_processing_time(
+            1).add_elements(list(range(10))).advance_processing_time(1))
+    squares = elems | beam.Map(lambda x: x**2)
+
+    # Watch the local scope for Interactive Beam so that referenced 
PCollections
+    # will be cached.
+    ib.watch(locals())
+
+    # This is normally done in the interactive_utils when a transform is
+    # applied but needs an IPython environment. So we manually run this here.
+    ie.current_env().track_user_pipelines()
+
+    # Get the recording then the BackgroundCachingJob.

Review comment:
       Are we still calling it BackgroundCachingJob?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 478118)
    Time Spent: 20h 40m  (was: 20.5h)

> Large Source Recording for Interarctive Runner
> ----------------------------------------------
>
>                 Key: BEAM-10603
>                 URL: https://issues.apache.org/jira/browse/BEAM-10603
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-py-interactive
>            Reporter: Sam Rohde
>            Assignee: Sam Rohde
>            Priority: P1
>          Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> This changes the Interactive Runner to create a long-running background 
> caching job that is decoupled from the user pipeline. When a user invokes a 
> collect() or show(), it will read from the cache to compute the requested 
> PCollections. Previously, the user would have to wait for the cache to be 
> fully written to. This allows for the user to start experimenting immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-10603) Large Source Recording for Interarctive Runner

Reply via email to