EnricoMi commented on code in PR #48456:
URL: https://github.com/apache/arrow/pull/48456#discussion_r2664370654
##########
python/pyarrow/parquet/core.py:
##########
@@ -715,32 +715,77 @@ def _sanitized_spark_field_name(name):
return _SPARK_DISALLOWED_CHARS.sub('_', name)
-def _sanitize_schema(schema, flavor):
- if 'spark' in flavor:
- sanitized_fields = []
+def _sanitize_field_recursive(field):
Review Comment:
The method name should indicate this is Spark specific.
```suggestion
def _sanitize_spark_field_recursive(field):
```
##########
python/pyarrow/parquet/core.py:
##########
@@ -715,32 +715,77 @@ def _sanitized_spark_field_name(name):
return _SPARK_DISALLOWED_CHARS.sub('_', name)
-def _sanitize_schema(schema, flavor):
- if 'spark' in flavor:
- sanitized_fields = []
+def _sanitize_field_recursive(field):
Review Comment:
Alternatively, the `_sanitized_spark_field_name` function could be injected,
then this method becomes generic (as most lines are Spark-agnostic). The caller
then simply calls
_sanitize_field_recursive(field, _sanitized_spark_field_name)
##########
python/pyarrow/parquet/core.py:
##########
@@ -715,32 +715,77 @@ def _sanitized_spark_field_name(name):
return _SPARK_DISALLOWED_CHARS.sub('_', name)
-def _sanitize_schema(schema, flavor):
- if 'spark' in flavor:
- sanitized_fields = []
+def _sanitize_field_recursive(field):
+ """
+ Recursively sanitize field names in struct types for Spark compatibility.
- schema_changed = False
+ Returns
+ -------
+ tuple
+ (sanitized_field, changed) where changed is True if any sanitization
occurred
+ """
+ sanitized_name = _sanitized_spark_field_name(field.name)
+ sanitized_type = field.type
+ type_changed = False
+
+ if pa.types.is_struct(field.type):
+ sanitized_fields = [_sanitize_field_recursive(f) for f in field.type]
+ if any(changed for _, changed in sanitized_fields):
+ sanitized_type = pa.struct([f for f, _ in sanitized_fields])
+ type_changed = True
Review Comment:
This list of tuples could be unpacked via `zip(*...)` so accessing the
fields and changed flag simplifies:
```suggestion
sanitized_fields, changed = zip(*[_sanitize_field_recursive(f) for f
in field.type])
if any(changed):
sanitized_type = pa.struct(sanitized_fields)
type_changed = True
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]