rdblue commented on code in PR #12311: URL: https://github.com/apache/iceberg/pull/12311#discussion_r1970487043
########## api/src/main/java/org/apache/iceberg/variants/VariantUtil.java: ########## @@ -18,21 +18,112 @@ */ package org.apache.iceberg.variants; +import java.math.BigDecimal; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.charset.StandardCharsets; +import java.util.Map; import java.util.function.Function; import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.types.Type; +import org.apache.iceberg.types.Types; -class VariantUtil { +public class VariantUtil { private static final int BASIC_TYPE_MASK = 0b11; private static final int BASIC_TYPE_PRIMITIVE = 0; private static final int BASIC_TYPE_SHORT_STRING = 1; private static final int BASIC_TYPE_OBJECT = 2; private static final int BASIC_TYPE_ARRAY = 3; + // TODO: Implement PhysicalType.TIME + // TODO: Implement PhysicalType.TIMESTAMPNTZ_NANO and PhysicalType.TIMESTAMPTZ_NANO + // TODO: Implement PhysicalType.UUID + private static final Map<Type, PhysicalType> NO_CONVERSION_NEEDED = + ImmutableMap.<Type, PhysicalType>builder() + .put(Types.IntegerType.get(), PhysicalType.INT32) + .put(Types.LongType.get(), PhysicalType.INT64) + .put(Types.FloatType.get(), PhysicalType.FLOAT) + .put(Types.DoubleType.get(), PhysicalType.DOUBLE) + .put(Types.DateType.get(), PhysicalType.DATE) + .put(Types.TimestampType.withoutZone(), PhysicalType.TIMESTAMPNTZ) + .put(Types.TimestampType.withZone(), PhysicalType.TIMESTAMPTZ) + .put(Types.StringType.get(), PhysicalType.STRING) + .put(Types.BinaryType.get(), PhysicalType.BINARY) + .put(Types.UnknownType.get(), PhysicalType.NULL) + .build(); + private VariantUtil() {} + @SuppressWarnings("unchecked") + public static <T> T castTo(VariantValue value, Type type) { Review Comment: I don't think that we should expose this on `VariantValue` and in fact I almost moved it into `InclusiveMetricsEvaluator` because the behavior here is a bit misleading. The problem is that we don't want to make callers think that this conversion is equivalent to any SQL operation because SQL engines will have a lot more complex logic. For instance, a SQL engine might have an expression like `CAST(extract(var_col, "$.field") AS INT) < 10`. Iceberg does something similar, but the cast expression in SQL is much more general and may handle cases like variant string to int. We want to avoid a false equivalence with SQL behavior, which is up to the engine. This is why the `Extract.eval` method is not implemented. I think the right thing is actually to make this less public and move it into a util class in expressions so that those classes can use it, but we don't expose it more broadly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org