[ 
https://issues.apache.org/jira/browse/BEAM-14294?focusedWorklogId=766170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766170
 ]

ASF GitHub Bot logged work on BEAM-14294:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/May/22 17:07
            Start Date: 04/May/22 17:07
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on PR #17384:
URL: https://github.com/apache/beam/pull/17384#issuecomment-1117595689

   It looks like its a pre-existing issue related to passing a large input 
(order 1000 elements) through test_batch_pardo in order to trigger a flush. I 
can reproduce the flake in a pipeline that doesn't have a batched DoFn:
   
   ```
     def test_large_input_pardo(self):                                          
                                                                                
                                                                     
       with self.create_pipeline() as p:                                        
   
         res = (                                                                
   
             p                                                                  
   
             | beam.Create(np.array(range(5000), 
dtype=np.int64)).with_output_types(np.int64)
             | beam.Map(lambda e: e * 2)                                        
   
             | beam.Map(lambda e: e + 3))                                       
   
         assert_that(res, equal_to([(i * 2) + 3 for i in range(5000)]))   
   ```
   
   It's worth noting I haven't seen the flake in a cython environment. Perhaps 
it's a bug in the "slow" stack which is often only tested with trivial inputs?
   
   I don't think I'll be able to diagnose the existing issue today. A solution 
for this PR might be to skip the flush-triggering test in non-cythonized 
environments.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 766170)
    Time Spent: 8h 10m  (was: 8h)

> MVP for SDK worker changes to support process_batch
> ---------------------------------------------------
>
>                 Key: BEAM-14294
>                 URL: https://issues.apache.org/jira/browse/BEAM-14294
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>          Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The initial MVP may only work in some restricted circumstances (e.g. 
> @yields_element on process_batch, or batch-to-batch without a 1:1 
> input:output mapping might not be supported). These cases should fail early. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to