XiaoHongbo-Hope commented on code in PR #7745:
URL: https://github.com/apache/paimon/pull/7745#discussion_r3168491328


##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -508,7 +485,33 @@ def section_reader_supplier(self, section: 
List[SortedRun]) -> RecordReader:
                 supplier = partial(self.kv_reader_supplier, file, 
self.deletion_file_readers.get(file.file_name, None))
                 data_readers.append(supplier)
             readers.append(ConcatRecordReader(data_readers))
-        return SortMergeReaderWithMinHeap(readers, self.table.table_schema)
+        merge_function = self._build_merge_function()
+        return SortMergeReaderWithMinHeap(
+            readers, self.table.table_schema, merge_function=merge_function)
+
+    def _build_merge_function(self):
+        """Pick the right MergeFunction implementation for the table's
+        ``merge-engine`` option. ``DEDUPLICATE`` is the default and the
+        only engine supported on the Python read path historically;
+        ``PARTIAL_UPDATE`` is now wired up to its dedicated
+        implementation. The remaining engines (``AGGREGATE`` /
+        ``FIRST_ROW``) used to silently degrade to dedupe — that quietly
+        produced wrong data — so we now raise an explicit
+        ``NotImplementedError`` instead, until they're ported.
+        """
+        engine = self.table.options.merge_engine()
+        if engine == MergeEngine.DEDUPLICATE:
+            return DeduplicateMergeFunction()
+        if engine == MergeEngine.PARTIAL_UPDATE:
+            return PartialUpdateMergeFunction(
+                key_arity=len(self.trimmed_primary_key),
+                value_arity=self.value_arity,
+            )
+        raise NotImplementedError(
+            "merge-engine '{}' is not implemented in pypaimon yet "
+            "(supported: deduplicate, partial-update). Use the Java "
+            "client or open an issue to track support.".format(engine.value)
+        )
 

Review Comment:
   The PR says sequence groups and per-field aggregate overrides are out of 
scope, but
     `_build_merge_function()` still enables `PartialUpdateMergeFunction` when 
those options are configured. Since the Python merge function ignores them, it 
can silently return wrong results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to