chuckwondo commented on issue #28558: URL: https://github.com/apache/beam/issues/28558#issuecomment-1735171162
> > Instead PickleCoder, which uses the pickle module, is what is being invoked to pickle the function, rather than dill. > > This is not the case, functions are pickled using `apache_beam.internal.pickler`. Coders are only user to encode pcollection elements. The `--pickle_library` option did not intend to influence the pickler's selection of PickleCoder - the intent of that coder was to use Python's standard pickler module. Perhaps that should be the case, but that is not what I am experiencing, which is why I'm reporting this. Apache Beam is very new to me, so it could very well be that I simply don't know what I'm doing, and I'm missing something important. I'll attempt to summarize and clarify what I'm doing and what I'm encountering: 1. My code snippets from above are pulled from this issue I created in `pangeo-forge-recipes`: https://github.com/pangeo-forge/pangeo-forge-recipes/issues/616 2. Since I wrote that issue, I discovered beam's `save_main_session` and `pickle_library` options as possibilities for addressing the pickling error I'm encountering. 3. Finding that no combination of setting those options eliminates the pickling error, I created the issue here. (Only by tweaking my locally installed apache_beam dependency's `PickleCoder` to use the internal `pickler` module was I able to eliminate the picking error.) My goal is to drop a problematic variable (`"lst_unc_sys"`) from my dataset, but using the `"preprocess"` option of the `mzz_kwargs` argument to `CombineReferences` is failing because something in the bowels of beam seems not to realize that it should be using `apache_beam.internal.pickler` to pickle the preprocess function I'm supplying. Is there something I'm missing in order to make that happen? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
