(spark) branch master updated: [SPARK-55044][SPARK-55088][TESTS][FOLLOW-UP] Add test for passing metadata in arrow batches from JVM to Python

gurwls223 Wed, 21 Jan 2026 14:09:00 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 45094dc41d3b [SPARK-55044][SPARK-55088][TESTS][FOLLOW-UP] Add test for 
passing metadata in arrow batches from JVM to Python
45094dc41d3b is described below

commit 45094dc41d3bd433ace05df3339bec8dfa169698
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jan 22 07:08:38 2026 +0900

    [SPARK-55044][SPARK-55088][TESTS][FOLLOW-UP] Add test for passing metadata 
in arrow batches from JVM to Python
    
    ### What changes were proposed in this pull request?
    Add test for passing metadata in arrow batches from JVM to Python
    
    ### Why are the changes needed?
    to improve test coverage
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    New UT
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #53883 from zhengruifeng/test_metadata_passing.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/sql/tests/arrow/test_arrow_map.py | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/python/pyspark/sql/tests/arrow/test_arrow_map.py 
b/python/pyspark/sql/tests/arrow/test_arrow_map.py
index f9928859b1bb..a15bdcca7362 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_map.py
@@ -140,6 +140,21 @@ class MapInArrowTestsMixin(object):
 
         self.assertEqual(self.spark.range(10).mapInArrow(empty_rows, "a 
double").count(), 0)
 
+    def test_passing_metadata(self):
+        def extract_metadata(iterator):
+            for batch in iterator:
+                assert isinstance(batch, pa.RecordBatch)
+                if batch.num_rows > 0:
+                    m = 
batch.schema.field("id").metadata[b"SPARK::metadata::json"]
+                    yield pa.RecordBatch.from_arrays(
+                        [pa.array([str(m)] * batch.num_rows)], 
names=["metadata"]
+                    )
+
+        df = self.spark.range(1).withMetadata("id", {"x": 1})
+
+        row = df.mapInArrow(extract_metadata, "metadata string").first()
+        self.assertEqual(row.metadata, """b'{"x":1}'""")
+
     def test_chain_map_in_arrow(self):
         def func(iterator):
             for batch in iterator:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55044][SPARK-55088][TESTS][FOLLOW-UP] Add test for passing metadata in arrow batches from JVM to Python

Reply via email to