Kalpana-chavhan opened a new issue, #37209:
URL: https://github.com/apache/beam/issues/37209

   ### Description
   
   Apache Beam Python SDK requires user-defined functions to be serializable 
for distributed execution. Currently, when users pass non-serializable lambdas 
or closures to beam.Map or beam.FlatMap, the resulting error is a low-level 
pickling exception that does not explain the cause or resolution.
   
   This issue proposes improving the error message during serialization failure 
to:
   
   - Clearly explain why serialization is required
   - Highlight common causes (captured variables, non-serializable objects)
   - Suggest correct patterns (named functions or DoFn classes)
   - Link to official documentation
   
   ### Proposed Solution
   
   Wrap the serialization call in apache_beam/internal/pickler.py with a 
clearer RuntimeError while preserving the original exception. Add unit tests to 
ensure the improved message is raised.
   
   ### Why this is valid
   This is a pure DX (Developer Experience) improvement. It does not change the 
execution logic of pipelines but significantly reduces the onboarding friction 
for new developers.
   
   ### I am willing to contribute
   I have identified the location in `pickler.py` and have a draft 
implementation ready with unit tests.
   
   ### Impact
   
   - Improves developer experience
   - Helps new Beam users debug pipelines faster
   - No behavior change or backward compatibility impact


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to