Re: [PR] GH-48455: [Python] Handle nested field names when sanitizing table at ParquetWriter (flavor='spark') [arrow]

via GitHub Wed, 17 Dec 2025 23:02:16 -0800


HyukjinKwon commented on code in PR #48456:
URL: https://github.com/apache/arrow/pull/48456#discussion_r2629829608



##########
python/pyarrow/parquet/core.py:
##########
@@ -715,32 +715,49 @@ def _sanitized_spark_field_name(name):
     return _SPARK_DISALLOWED_CHARS.sub('_', name)
 
 
-def _sanitize_schema(schema, flavor):
-    if 'spark' in flavor:
-        sanitized_fields = []
+def _sanitize_field_recursive(field):
+    """
+    Recursively sanitize field names in struct types for Spark compatibility.
 
-        schema_changed = False
+    Returns
+    -------
+    tuple
+        (sanitized_field, changed) where changed is True if any sanitization 
occurred
+    """
+    sanitized_name = _sanitized_spark_field_name(field.name)
+    sanitized_type = field.type
+    type_changed = False
 
-        for field in schema:
-            name = field.name
-            sanitized_name = _sanitized_spark_field_name(name)
+    if pa.types.is_struct(field.type):

Review Comment:
   Actually I think I should also look up the case, e.g., arrays of structs. 
Let me mark it as a draft for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48455: [Python] Handle nested field names when sanitizing table at ParquetWriter (flavor='spark') [arrow]

Reply via email to