Another scenario I would be concerned about is a table that contains only
UnknownType top-level fields. What will happen in such a scenario? Will the
Parquet format tolerate zero-column files? I think it's likely that even if
Parquet-java supports this, it might be an untested and unsupported corner
case in many of the alternative Parquet readers.

I think it might be worth revisiting the decision not to store UnknownType
in the data files. There are many corner cases that result in many
limitations on the acceptable schema. These limitations are driven by the
quirks of the Parquet format rather than the definition of UknownType, or
anything understandable to the end user.

If instead the UknownType column was written into the Parquet file (for
example, leveraging the UNKNOWN logical type in Parquet), it could be a
first-class citizen. I think even allowing it as a map key can be fine
within some limits. The only acceptable value for such a map would be
either null (= the whole map is null) or an empty map, but the user would
be free to alter the key type to any desired type down the line.


On 2025/07/28 08:11:16 Bart Samwel wrote:
> On Sat, Jul 26, 2025 at 6:09 PM Kevin Liu <ke...@apache.org> wrote:
>
> > > My initial idea was to disallow the use of UnknownType as the element
> > in ListType and not allow the UnknownType as either a Key or Value of a
> > MapType. Any thoughts or concerns?
> >
> > That makes sense. I would also include `StructType` here too.
`StructType`
> > is another  "complex type" (extends NestedType
> > <
https://github.com/apache/iceberg/blob/360f87326d4ccf67512a0240e529035801d9db2b/api/src/main/java/org/apache/iceberg/types/Types.java#L1001
>)
> > just like `ListType` and `MapType`.
> > This will make `unknown` the first primitive type to not be allowed as
> > part of another complex type.
> >
>
> Do you mean to forbid `UnknownType` inside `StructType`? I'm afraid that
> would undermine the orthogonality of the system. A common use of
StructType
> is to store entire rows. If StructType cannot contain elements that are
> UnknownType but top-level rows can, then you can no longer store an
> arbitrary top-level row inside a StructType.
>
> Unfortunately UnknownType in struct does have some issues. In particular,
> if it's not stored, then IIUIC you can have issues with structs containing
> only UnknownType fields -- they will look empty to Parquet, and my
> understanding is that that isn't allowed. For orthogonality it would have
> been better to actually store the unknown type, even if it's just as a
> series of "this is NULL" bits. Omitting these fields in storage seems like
> a convenient hack that leads to all sorts of surprising corner cases...
>
>
> On Sat, Jul 26, 2025 at 5:43 AM Fokko Driesprong <fo...@apache.org> wrote:
> >
> >> Hi everyone,
> >>
> >> Recently I took a stab at implementing reading UknownType
> >> <https://github.com/apache/iceberg/pull/13445> in the Java
> >> implementation. I thought it would make sense to add this to the
reference
> >> implementation first. However, I ran into a limitation with the current
> >> definition in the spec:
> >>
> >> Must be optional with null defaults; not stored in data files
> >>
> >>
> >> One obvious limitation is that it cannot be the key of a MapType, as it
> >> has to be not-null. It can't be stored either as the value of a MapType
> >> since there is no easy way to store just the key without doing awkward
> >> things, such as writing just the keys as a list.
> >>
> >> My initial idea was to disallow the use of UnknownType as the element
in
> >> ListType and not allow the UnknownType as either a Key or Value of a
> >> MapType. Any thoughts or concerns?
> >>
> >> Kind regards from Belgium,
> >> Fokko
> >>
> >
>

Reply via email to