cisaacstern commented on PR #27618:
URL: https://github.com/apache/beam/pull/27618#issuecomment-1781754738
Continuing think aloud here.. my latest experiments further narrow the cause
of the serialization issue to the attempt to pickle `apply_dofn_to_bundle`
which is dynamically-defined in the body of `ParDo` in
(`runners/dask/transform_evaluator.py`):
- Current state, **raises pickling error** 🥒 ❌ :
```diff
return (
input_bag
.map(get_windowed_value, window_fn)
.map_partitions(apply_dofn_to_bundle)
)
```
- Replace both mapped fn's with identities, **no pickling error** 🥒 ✅ :
```diff
return (
input_bag
- .map(get_windowed_value, window_fn)
- .map_partitions(apply_dofn_to_bundle)
+ .map(lambda x: x)
+ .map_partitions(lambda x: x)
)
```
- Only replace `get_windowed_value` with identity, **raises pickling error**
🥒 ❌ :
```diff
return (
input_bag
- .map(get_windowed_value, window_fn)
+ .map(lambda x: x)
.map_partitions(apply_dofn_to_bundle)
)
```
- Only replace `apply_dofn_to_bundle` with identity, **no pickling error** 🥒
✅ :
```diff
return (
input_bag
.map(get_windowed_value, window_fn)
- .map_partitions(apply_dofn_to_bundle)
+ .map_partitions(lambda x: x)
)
```
Now going to try to lift that definition up to the module level (passing the
dynamic aspects as kwargs) and see if that can bring any benefit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]