BTW, It was pointed out to me, that Unknown type is also useful for SQL
systems where somebody writes a query like:

"SELECT null as someColName"

In this case someColName would have an unknown type as well.

On Fri, Nov 28, 2025 at 12:59 AM Joana Hrotkó <[email protected]>
wrote:

> Thanks, Micah!
>
> On Thu, Nov 27, 2025 at 4:48 AM Micah Kornfield <[email protected]>
> wrote:
>
>> Hi Joana,
>> Here are my thoughts, which are by no means the definitive answer here.
>>
>>
>>> 1. Given that variant can store any data type (both structured and
>>> primitive), I'm unclear when unknown would be preferred as similar
>>> behavior could be achieved by adding nullable variant columns? It seems
>>> like variant could handle most schema evolution scenarios. Are there
>>> specific situations where unknown is the better choice?
>>
>>
>> I think the point of the type is to not impose on a system the need have
>> to use a nullable variant column if it can't infer the type.   The variant
>> type has more overhead and can't easily be narrowed solely based on a
>> metadata operation to other types (but a NullType can easily be widened to
>> any type as a metadata operation).
>>
>> The null type is generally meant from moving from schema-less systems to
>> ones with a schema.  e.g. A CSV file that has an empty value for every
>> field in a particular column.  I think Parquet's description of its
>> analogous type [1] is a good illustration:
>>
>> "Sometimes when discovering the schema of existing data, values are
>> always null and the physical type can't be determined. This annotation
>> signals the case where the physical type was guessed from all null values."
>>
>> That being said I don't think it is necessarily a bad idea if a system
>> wants to use Nullable variants for this use-case.
>>
>> 2. Also, is unknown intended for explicit use in DDL? Meaning, should
>>> users write DDL like:
>>
>>
>> In general, I don't think there is much of a use-case for allowing users
>> to set this through DDL, other than perhaps cloning it from an existing
>> table. As you pointed out if someone wishing to keep there options open is
>> likely better off using variant, or a type that can be widened later.
>>
>> There are probably multiple ways of handling evolution but two possible
>> workable alternatives (I don't think these belong in the iceberg spec):
>> 1.  Automatically evolve the schema based on the first inserted non-null
>> value for the column.
>> 2.  Block insertions that try to insert a non-null values in the column
>> until user explicitly alters the column to a specific type.
>>
>> Cheers,
>> Micah
>>
>> [1]
>> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L330
>>
>> On Tue, Nov 18, 2025 at 4:45 AM Joana Hrotkó
>> <[email protected]> wrote:
>>
>>> Hi Iceberg Community,
>>>
>>> I'm working with Iceberg v3 and trying to understand the practical use
>>> cases for the unknown type, especially in relation to the variant type.
>>>
>>> The variant type handles both semi-structured data (JSON, nested
>>> objects/arrays) and primitive types (strings, integers, booleans, dates,
>>> timestamps, etc.) with efficient binary encoding. It supports schema
>>> evolution and provides good query performance.
>>>
>>> The unknown type is described as being for "evolving schemas without
>>> forcing immediate resolution" and must always default to null.
>>>
>>> 1. Given that variant can store any data type (both structured and
>>> primitive), I'm unclear when unknown would be preferred as similar
>>> behavior could be achieved by adding nullable variant columns? It seems
>>> like variant could handle most schema evolution scenarios. Are there
>>> specific situations where unknown is the better choice?
>>>
>>> 2. Also, is unknown intended for explicit use in DDL? Meaning, should
>>> users write DDL like:
>>>
>>> CREATE TABLE foo (col1 unknown)ALTER TABLE foo ADD COLUMN col2 unknown
>>>
>>> Or is unknown an internal type that engines use automatically during
>>> schema evolution?
>>>
>>> Cheers,
>>>
>>> Joana Hrotkó
>>>
>>

Reply via email to