Re: [PR] [core] introduce Placeholder for Blob File Format [paimon]

via GitHub Tue, 26 May 2026 01:31:59 -0700


JingsongLi commented on code in PR #7889:
URL: https://github.com/apache/paimon/pull/7889#discussion_r3302333834



##########
paimon-core/src/main/java/org/apache/paimon/operation/DataEvolutionSplitRead.java:
##########
@@ -327,6 +331,83 @@ private DataEvolutionFileReader createUnionReader(
         return new DataEvolutionFileReader(rowOffsets, fieldOffsets, 
fileRecordReaders);
     }
 
+    private RecordReader<InternalRow> createFieldBunchReader(
+            BinaryRow partition,
+            FieldBunch bunch,
+            DataFilePathFactory dataFilePathFactory,
+            FormatReaderMapping formatReaderMapping,
+            List<Range> rowRanges,
+            RowType readRowType)
+            throws IOException {
+        if (bunch instanceof DataBunch) {
+            // for data bunch, directly read the single file
+            return createFileReader(
+                    partition,
+                    bunch.files().get(0),
+                    dataFilePathFactory,
+                    formatReaderMapping,
+                    rowRanges,
+                    readRowType);
+        } else if (bunch instanceof VectorFileBunch) {
+            // for vector bunch, sequential read all data files and concat them
+            List<ReaderSupplier<InternalRow>> readerSuppliers = new 
ArrayList<>();
+            for (DataFileMeta file : bunch.files()) {
+                RoaringBitmap32 selection = file.toFileSelection(rowRanges);
+                FormatReaderContext formatReaderContext =
+                        new FormatReaderContext(
+                                fileIO,
+                                dataFilePathFactory.toPath(file),
+                                file.fileSize(),
+                                selection);
+                readerSuppliers.add(
+                        () ->
+                                new DataFileRecordReader(
+                                        readRowType,
+                                        formatReaderMapping.getReaderFactory(),
+                                        formatReaderContext,
+                                        coreOptions.scanIgnoreCorruptFile(),
+                                        coreOptions.scanIgnoreLostFile(),
+                                        formatReaderMapping.getIndexMapping(),
+                                        formatReaderMapping.getCastMapping(),
+                                        PartitionUtils.create(
+                                                
formatReaderMapping.getPartitionPair(), partition),
+                                        true,
+                                        file.firstRowId(),
+                                        file.maxSequenceNumber(),
+                                        
formatReaderMapping.getSystemFields()));
+            }
+            return ConcatRecordReader.create(readerSuppliers);
+        } else if (bunch instanceof BlobFileBunch) {
+            // for blob bunch, fallback on placeholders
+            int blobIndex = findBlobFieldIndex(readRowType);
+            checkArgument(blobIndex >= 0, "Blob bunch read type should contain 
a blob field.");
+            return new BlobFallbackRecordReader(

Review Comment:
   Is there a fast `bunch.files().size() == 1` channel here? When there is no 
need to merge, it should be a simple code path.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] introduce Placeholder for Blob File Format [paimon]

Reply via email to