JingsongLi commented on code in PR #8162:
URL: https://github.com/apache/paimon/pull/8162#discussion_r3372266976


##########
paimon-python/pypaimon/write/table_update_by_row_id.py:
##########
@@ -317,15 +321,49 @@ def _merge_update_with_original(
                     for i in range(original_data.num_rows)
                 ]
             else:
-                # replace_with_mask fills mask=True positions with update 
values in order
-                merged_columns[col_name] = pc.replace_with_mask(
-                    original_col, mask, update_col.cast(original_col.type)
-                )
+                try:
+                    merged_columns[col_name] = pc.replace_with_mask(
+                        original_col, mask, update_col)
+                except pa.lib.ArrowNotImplementedError:
+                    n = original_data.num_rows
+                    combined = pa.concat_arrays(
+                        [original_col, update_col])
+                    offset = len(original_col)
+                    indices = np.arange(n, dtype=np.int64)
+                    for orig_pos, upd_idx in update_positions.items():
+                        indices[orig_pos] = offset + upd_idx
+                    merged_columns[col_name] = combined.take(
+                        pa.array(indices))
 
         merged_table = pa.table(merged_columns) if merged_columns else None
 
         return merged_table, blob_columns
 
+    @staticmethod
+    def _coerce_column(col: pa.Array, target_type: pa.DataType) -> pa.Array:
+        try:
+            return col.cast(target_type)
+        except (pa.lib.ArrowNotImplementedError,
+                pa.lib.ArrowInvalid,
+                pa.lib.ArrowTypeError):
+            pass
+        pylist = col.to_pylist()
+        if pa.types.is_map(target_type):
+            converted = []
+            for row in pylist:
+                if row is None:
+                    converted.append(None)
+                elif isinstance(row, dict):

Review Comment:
   _coerce_column drops every None value when converting inferred dict input to 
map by filtering if v is not None. This loses valid map entries like {'a': 
None} when callers pass natural dict-shaped PyArrow input without an explicit 
schema. I reproduced it end-to-end: updating a map<string,string> column with 
{'a': None} reads back as [], not [('a', None)]. The added test covers null 
values only via explicit pa.map_ list-of-pairs schema, so it misses this 
regression.



##########
paimon-python/pypaimon/write/table_update_by_row_id.py:
##########
@@ -317,15 +321,49 @@ def _merge_update_with_original(
                     for i in range(original_data.num_rows)
                 ]
             else:
-                # replace_with_mask fills mask=True positions with update 
values in order
-                merged_columns[col_name] = pc.replace_with_mask(
-                    original_col, mask, update_col.cast(original_col.type)
-                )
+                try:
+                    merged_columns[col_name] = pc.replace_with_mask(
+                        original_col, mask, update_col)
+                except pa.lib.ArrowNotImplementedError:
+                    n = original_data.num_rows
+                    combined = pa.concat_arrays(
+                        [original_col, update_col])
+                    offset = len(original_col)
+                    indices = np.arange(n, dtype=np.int64)
+                    for orig_pos, upd_idx in update_positions.items():
+                        indices[orig_pos] = offset + upd_idx
+                    merged_columns[col_name] = combined.take(
+                        pa.array(indices))
 
         merged_table = pa.table(merged_columns) if merged_columns else None
 
         return merged_table, blob_columns
 
+    @staticmethod
+    def _coerce_column(col: pa.Array, target_type: pa.DataType) -> pa.Array:
+        try:
+            return col.cast(target_type)
+        except (pa.lib.ArrowNotImplementedError,
+                pa.lib.ArrowInvalid,
+                pa.lib.ArrowTypeError):
+            pass
+        pylist = col.to_pylist()
+        if pa.types.is_map(target_type):
+            converted = []
+            for row in pylist:
+                if row is None:
+                    converted.append(None)
+                elif isinstance(row, dict):

Review Comment:
   `_coerce_column` drops every None value when converting inferred dict input 
to map by filtering if v is not None. This loses valid map entries like {'a': 
None} when callers pass natural dict-shaped PyArrow input without an explicit 
schema. I reproduced it end-to-end: updating a map<string,string> column with 
{'a': None} reads back as [], not [('a', None)]. The added test covers null 
values only via explicit pa.map_ list-of-pairs schema, so it misses this 
regression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to