[
https://issues.apache.org/jira/browse/BEAM-14294?focusedWorklogId=766170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766170
]
ASF GitHub Bot logged work on BEAM-14294:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/May/22 17:07
Start Date: 04/May/22 17:07
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on PR #17384:
URL: https://github.com/apache/beam/pull/17384#issuecomment-1117595689
It looks like its a pre-existing issue related to passing a large input
(order 1000 elements) through test_batch_pardo in order to trigger a flush. I
can reproduce the flake in a pipeline that doesn't have a batched DoFn:
```
def test_large_input_pardo(self):
with self.create_pipeline() as p:
res = (
p
| beam.Create(np.array(range(5000),
dtype=np.int64)).with_output_types(np.int64)
| beam.Map(lambda e: e * 2)
| beam.Map(lambda e: e + 3))
assert_that(res, equal_to([(i * 2) + 3 for i in range(5000)]))
```
It's worth noting I haven't seen the flake in a cython environment. Perhaps
it's a bug in the "slow" stack which is often only tested with trivial inputs?
I don't think I'll be able to diagnose the existing issue today. A solution
for this PR might be to skip the flush-triggering test in non-cythonized
environments.
Issue Time Tracking
-------------------
Worklog Id: (was: 766170)
Time Spent: 8h 10m (was: 8h)
> MVP for SDK worker changes to support process_batch
> ---------------------------------------------------
>
> Key: BEAM-14294
> URL: https://issues.apache.org/jira/browse/BEAM-14294
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Time Spent: 8h 10m
> Remaining Estimate: 0h
>
> The initial MVP may only work in some restricted circumstances (e.g.
> @yields_element on process_batch, or batch-to-batch without a 1:1
> input:output mapping might not be supported). These cases should fail early.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)