[
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=346196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346196
]
ASF GitHub Bot logged work on BEAM-8575:
----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Nov/19 19:20
Start Date: 19/Nov/19 19:20
Worklog Time Spent: 10m
Work Description: robertwb commented on pull request #10070: [BEAM-8575]
Added a unit test for Reshuffle to test that Reshuffle pr…
URL: https://github.com/apache/beam/pull/10070#discussion_r348118958
##########
File path: sdks/python/apache_beam/transforms/util_test.py
##########
@@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self):
label='after reshuffle')
pipeline.run()
+ @attr('ValidatesRunner')
+ def test_reshuffle_preserves_timestamps(self):
+ pipeline = TestPipeline()
+
+ # Create a PCollection and assign each element with a different timestamp.
+ before_reshuffle = (pipeline
+ | "Four elements" >> beam.Create([
+ {'name': 'foo', 'timestamp': MIN_TIMESTAMP},
+ {'name': 'foo', 'timestamp': 0},
+ {'name': 'bar', 'timestamp': 33},
+ {'name': 'bar', 'timestamp': MAX_TIMESTAMP},
+ ])
+ | "With timestamp" >> beam.Map(
+ lambda element: beam.window.TimestampedValue(
+ element, element['timestamp'])))
+
+ # For each element in a PCollection, gets the current timestamp of the
+ # element and reassigns the timestamp to the element.
+ class AddTimestamp(beam.DoFn):
+ def process(self, element, timestamp=beam.DoFn.TimestampParam):
+ yield beam.window.TimestampedValue(element, timestamp)
+
+ # Reshuffle the PCollection above and assign the timestamp of an element to
+ # that element again.
+ after_reshuffle = (before_reshuffle
+ | "Reshuffle" >> beam.Reshuffle()
+ | "With timestamps again" >> beam.ParDo(AddTimestamp()))
+
+ # Given an element, emits a string which contains the timestamp and the
name
+ # field of the element.
+ class FormatWithTimestamp(beam.DoFn):
Review comment:
There's no need to create a DoFn, just create a function and do
beam.Map(format_with_timestamp).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 346196)
Time Spent: 12h 40m (was: 12.5h)
> Add more Python validates runner tests
> --------------------------------------
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
> Issue Type: Test
> Components: sdk-py-core, testing
> Reporter: wendy liu
> Assignee: wendy liu
> Priority: Major
> Time Spent: 12h 40m
> Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to
> improve test coverage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)