github-actions[bot] commented on code in PR #63192:
URL: https://github.com/apache/doris/pull/63192#discussion_r3254351792
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -1058,13 +1064,66 @@ public TFileFormatType getFileFormatType() throws
UserException {
if (icebergFormat.equalsIgnoreCase("parquet")) {
type = TFileFormatType.FORMAT_PARQUET;
} else if (icebergFormat.equalsIgnoreCase("orc")) {
+ validateVariantReadSupported(icebergFormat);
type = TFileFormatType.FORMAT_ORC;
} else {
throw new DdlException(String.format("Unsupported format name: %s
for iceberg table.", icebergFormat));
}
return type;
}
+ private void validateVariantReadSupported(String icebergFormat) throws
DdlException {
+ String variantColumnName = findVariantReadColumnName();
+ if (variantColumnName != null) {
+ throw new DdlException("Reading Iceberg VARIANT columns is only
supported for Parquet files, "
+ + "but table file format is " + icebergFormat + ": " +
variantColumnName);
+ }
+ }
+
+ @VisibleForTesting
+ void validateVariantDataFileFormat(FileFormat dataFileFormat, String path)
{
+ if (dataFileFormat == FileFormat.PARQUET) {
+ return;
+ }
+ String variantColumnName = findVariantReadColumnName();
+ if (variantColumnName != null) {
+ throw new NotSupportedException("Reading Iceberg VARIANT columns
is only supported for Parquet files, "
+ + "but data file format is " + dataFileFormat.name() + ":
" + variantColumnName
+ + " (" + path + ")");
+ }
+ }
+
+ private String findVariantReadColumnName() {
+ for (SlotDescriptor slot : desc.getSlots()) {
+ Column column = slot.getColumn();
+ if (containsVariantType(column.getType())) {
+ return column.getName();
Review Comment:
This validation uses `slot.getColumn().getType()`, which is the original
Iceberg table column type, not the materialized slot type after nested-column
pruning. That over-rejects legal non-Parquet reads when only non-VARIANT
subfields are selected. For example, an ORC Iceberg column `s STRUCT<a:int,
v:variant>` queried as `SELECT s.a FROM tbl` can be pruned/materialized as
`STRUCT<a:int>` and does not read any VARIANT data, but this still sees the
original `s.v` and throws the Parquet-only VARIANT error. The same false
positive applies to `validateVariantDataFileFormat()` because it calls this
helper too. Please validate the pruned/materialized slot type or selected
access paths instead, and add coverage for reading a non-VARIANT subfield from
a complex ORC Iceberg column that has an unselected VARIANT sibling.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]