This is an automated email from the ASF dual-hosted git repository.

cvandermerwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 86abab2c2e4 [#37209] Enhance serialization error messages (#37298)
86abab2c2e4 is described below

commit 86abab2c2e4e1ed130330bdfec52de5c0a731991
Author: ZIHAN DAI <[email protected]>
AuthorDate: Fri Jan 23 02:59:36 2026 +1100

    [#37209] Enhance serialization error messages (#37298)
    
    * [#37209] Enhance serialization error messages for better DX
    
    Improved error messages when user code fails to serialize (pickle)
    for distributed execution. The original error was too technical and
    didn't explain the cause or suggest fixes.
    
    Changes:
    - Enhanced RuntimeError message with clear explanation of why
      serialization is required
    - Added common causes (lambdas capturing file handles, DB connections,
      thread locks)
    - Provided three concrete fixes: module-level functions, setup()
      methods, checking closure captures
    - Broadened exception catching to include TypeError and other
      pickling failures (not just RuntimeError)
    - Added exception chaining (from e) to preserve original stack trace
    - Added test case to verify the new error message content
    
    This significantly improves developer experience when debugging
    serialization issues, especially for new Apache Beam users.
    
    Fixes #37209
    
    Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
    
    * Apply yapf formatting
    
    Fix Python formatter precommit check by applying yapf v0.43.0
    formatting rules to modified files.
    
    * [#37209] Make withBackOffSupplier public to enable bounded retry 
configuration
    
    Users need to configure bounded backoff to prevent infinite retry loops. 
Making withBackOffSupplier public allows users to set 
FluentBackoff.DEFAULT.withMaxRetries(n) and control retry behavior.
    
    Added integration test demonstrating bounded retry with maxRetries=3.
    
    Related to #37198, #37176
    
    Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
    
    * Revert unrelated Java changes
    
    ---------
    
    Co-authored-by: Claude Sonnet 4.5 <[email protected]>
---
 sdks/python/apache_beam/transforms/ptransform.py      | 11 +++++++++--
 sdks/python/apache_beam/transforms/ptransform_test.py | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/sdks/python/apache_beam/transforms/ptransform.py 
b/sdks/python/apache_beam/transforms/ptransform.py
index 9c5306e143e..94e9a0644d0 100644
--- a/sdks/python/apache_beam/transforms/ptransform.py
+++ b/sdks/python/apache_beam/transforms/ptransform.py
@@ -883,8 +883,15 @@ class PTransformWithSideInputs(PTransform):
     # Ensure fn and side inputs are picklable for remote execution.
     try:
       self.fn = pickler.roundtrip(self.fn)
-    except RuntimeError as e:
-      raise RuntimeError('Unable to pickle fn %s: %s' % (self.fn, e))
+    except (RuntimeError, TypeError, Exception) as e:
+      raise RuntimeError(
+          'Unable to pickle fn %s: %s. '
+          'User code must be serializable (picklable) for distributed '
+          'execution. This usually happens when lambdas or closures capture '
+          'non-serializable objects like file handles, database connections, '
+          'or thread locks. Try: (1) using module-level functions instead of '
+          'lambdas, (2) initializing resources in setup() methods, '
+          '(3) checking what your closure captures.' % (self.fn, e)) from e
 
     self.args = pickler.roundtrip(self.args)
     self.kwargs = pickler.roundtrip(self.kwargs)
diff --git a/sdks/python/apache_beam/transforms/ptransform_test.py 
b/sdks/python/apache_beam/transforms/ptransform_test.py
index 9a9bf6ff0a7..8c2acefccdb 100644
--- a/sdks/python/apache_beam/transforms/ptransform_test.py
+++ b/sdks/python/apache_beam/transforms/ptransform_test.py
@@ -163,6 +163,25 @@ class PTransformTest(unittest.TestCase):
           lambda x, addon: [x + addon], addon=pvalue.AsSingleton(side))
       assert_that(result, equal_to([11, 12, 13]))
 
+  def test_callable_non_serializable_error_message(self):
+    class NonSerializable:
+      def __getstate__(self):
+        raise RuntimeError('nope')
+
+    bad = NonSerializable()
+
+    with self.assertRaises(RuntimeError) as context:
+      _ = beam.Map(lambda x: bad)
+
+    message = str(context.exception)
+    self.assertIn('Unable to pickle fn', message)
+    self.assertIn(
+        'User code must be serializable (picklable) for distributed 
execution.',
+        message)
+    self.assertIn('non-serializable objects like file handles', message)
+    self.assertIn(
+        'Try: (1) using module-level functions instead of lambdas', message)
+
   def test_do_with_do_fn_returning_string_raises_warning(self):
     ex_details = r'.*Returning a str from a ParDo or FlatMap is discouraged.'
 

Reply via email to