github-actions[bot] commented on code in PR #63192:
URL: https://github.com/apache/doris/pull/63192#discussion_r3254351792


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -1058,13 +1064,66 @@ public TFileFormatType getFileFormatType() throws 
UserException {
         if (icebergFormat.equalsIgnoreCase("parquet")) {
             type = TFileFormatType.FORMAT_PARQUET;
         } else if (icebergFormat.equalsIgnoreCase("orc")) {
+            validateVariantReadSupported(icebergFormat);
             type = TFileFormatType.FORMAT_ORC;
         } else {
             throw new DdlException(String.format("Unsupported format name: %s 
for iceberg table.", icebergFormat));
         }
         return type;
     }
 
+    private void validateVariantReadSupported(String icebergFormat) throws 
DdlException {
+        String variantColumnName = findVariantReadColumnName();
+        if (variantColumnName != null) {
+            throw new DdlException("Reading Iceberg VARIANT columns is only 
supported for Parquet files, "
+                    + "but table file format is " + icebergFormat + ": " + 
variantColumnName);
+        }
+    }
+
+    @VisibleForTesting
+    void validateVariantDataFileFormat(FileFormat dataFileFormat, String path) 
{
+        if (dataFileFormat == FileFormat.PARQUET) {
+            return;
+        }
+        String variantColumnName = findVariantReadColumnName();
+        if (variantColumnName != null) {
+            throw new NotSupportedException("Reading Iceberg VARIANT columns 
is only supported for Parquet files, "
+                    + "but data file format is " + dataFileFormat.name() + ": 
" + variantColumnName
+                    + " (" + path + ")");
+        }
+    }
+
+    private String findVariantReadColumnName() {
+        for (SlotDescriptor slot : desc.getSlots()) {
+            Column column = slot.getColumn();
+            if (containsVariantType(column.getType())) {
+                return column.getName();

Review Comment:
   This validation uses `slot.getColumn().getType()`, which is the original 
Iceberg table column type, not the materialized slot type after nested-column 
pruning. That over-rejects legal non-Parquet reads when only non-VARIANT 
subfields are selected. For example, an ORC Iceberg column `s STRUCT<a:int, 
v:variant>` queried as `SELECT s.a FROM tbl` can be pruned/materialized as 
`STRUCT<a:int>` and does not read any VARIANT data, but this still sees the 
original `s.v` and throws the Parquet-only VARIANT error. The same false 
positive applies to `validateVariantDataFileFormat()` because it calls this 
helper too. Please validate the pruned/materialized slot type or selected 
access paths instead, and add coverage for reading a non-VARIANT subfield from 
a complex ORC Iceberg column that has an unselected VARIANT sibling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to