TheNeuralBit commented on issue #22091:
URL: https://github.com/apache/beam/issues/22091#issuecomment-1169421317

   Note the breakage is just for `beam.FlatMap` constructed with a built-in, 
`beam.Map` is unaffected. Moreover, only built-in _functions_ (e.g. `len`, 
`sum`, `iter`) trigger the problematic path, not built-in types (e.g. `list`, 
`bytes`, `set`). The former have `__self__` attribute which makes them look 
like methods to the logic added in 0de9821:
   ```
   In [1]: sum.__self__
   Out[1]: <module 'builtins' (built-in)>
   ```
   
   So to summarize, just `beam.FlatMap(<built-in function>)` will fail. It's 
actually kind of hard to devise a use-case for this, since the built-in 
function must produce an iterable. I thought this over and only came up with 
`beam.FlatMap(sum)` over a PCollection where the elements are lists of numpy 
arrays. I looked over the list of built-in functions and couldn't come up with 
another way to use one in a FlatMap.
   
   Given that, I'm not sure this issue has a large enough blast radius to 
warrant a bugfix release. Not there is a straightforward workaround for that 
use-case, wrap `sum` in a helper function or lambda:
   
   ```py
   def _sum(x):
     return sum(x)
     
   pc | beam.FlatMap(_sum)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to