JingsongLi commented on code in PR #8021:
URL: https://github.com/apache/paimon/pull/8021#discussion_r3348135024


##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -823,12 +830,27 @@ def create_reader(self) -> RecordReader:
         else:
             reader = merge_reader
 
-        if (not CoreOptions.blob_as_descriptor(self.table.options)
-                and CoreOptions.blob_descriptor_fields(self.table.options)):
-            reader = BlobDescriptorConvertReader(reader, self.table)
-
         return reader
 
+    def _create_prescan_reader(self, field_names):
+        """Create a prescan reader by constructing a new DataEvolutionSplitRead
+        instance that only projects the specified field names."""
+        from pypaimon.read.reader.iface.record_batch_reader import 
EmptyRecordBatchReader
+
+        prescan_fields = [f for f in self.read_fields if f.name in field_names]
+        if not prescan_fields:
+            return EmptyRecordBatchReader()
+
+        prescan_read = DataEvolutionSplitRead(

Review Comment:
   Thanks for checking. The missing upstream table was only a minimal way to 
make the semantic issue visible. The correctness problem is that the prescan 
observes rows that the final query would filter out or skip. A filtered-out row 
may contain a stale BlobViewStruct, a permission/network failure, or simply 
many expensive references, and that should not make `WHERE id = 1` fail or 
trigger extra upstream reads.
   
   Also, a limit-pushdown change in another PR would not address the predicate 
case here, because this prescan rebuilds the read type with only blob-view 
fields, so predicates on non-view columns are trimmed before the prescan reader 
is built. I think this PR needs to preserve the effective filter/limit in the 
prescan path, matching the Java path, and add regression tests for both cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to