[
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354769
]
ASF GitHub Bot logged work on BEAM-8645:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Dec/19 23:18
Start Date: 05/Dec/19 23:18
Worklog Time Spent: 10m
Work Description: robertwb commented on pull request #10143: [BEAM-8645]
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354596302
##########
File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
##########
@@ -1579,6 +1580,49 @@ def test_lull_logging(self):
'.*There has been a processing lull of over.*',
'Unable to find a lull logged for this job.')
+class StateBackedTestElementType(object):
+ element_count = 0
+
+ def __init__(self, num_elems):
+ self.num_elems = num_elems
+ self.value = ['a' for _ in range(num_elems)]
+ StateBackedTestElementType.element_count += 1
+ # Due to using state backed iterable, we expect there is a few instances
+ # alive at any given time.
+ if StateBackedTestElementType.element_count > 5:
+ raise RuntimeError('Too many live instances.')
+
+ def __del__(self):
+ StateBackedTestElementType.element_count -= 1
+
+ def __reduce__(self):
+ return (self.__class__, (self.num_elems, ))
+
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+
+ class ElementDoFn(beam.DoFn):
+ def process(self, elements):
+ unused_key, ts = elements
+
+ yield sum([item.num_elems for item in ts])
+
+ def create_pipeline(self):
+ return beam.Pipeline(
+ runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+ def test_gbk_many_values(self):
+ with self.create_pipeline() as p:
+ # The number of integers could be a knob to test against
Review comment:
At least make them constants (and use their product below), or perhaps even
arguments (with defaults) to this test (which make it easy to parameterize
externally).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 354769)
Time Spent: 9h 10m (was: 9h)
> TimestampCombiner incorrect in beam python
> ------------------------------------------
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Ruoyun Huang
> Priority: Major
> Time Spent: 9h 10m
> Remaining Estimate: 0h
>
> When we have a TimestampValue on combine:
> {code:java}
> main_stream = (p
> | 'main TestStream' >> TestStream()
> .add_elements([window.TimestampedValue(('k', 100), 0)])
> .add_elements([window.TimestampedValue(('k', 400), 9)])
> .advance_watermark_to_infinity()
> | 'main windowInto' >> beam.WindowInto(
> window.FixedWindows(10),
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) |
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST: (('k', 500), Timestamp(9)),
> EARLIEST: (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results:
> LATEST: (('k', 500), Timestamp(10)),
> EARLIEST: (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.99999999)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)