Another scenario I would be concerned about is a table that contains only UnknownType top-level fields. What will happen in such a scenario? Will the Parquet format tolerate zero-column files? I think it's likely that even if Parquet-java supports this, it might be an untested and unsupported corner case in many of the alternative Parquet readers.
I think it might be worth revisiting the decision not to store UnknownType in the data files. There are many corner cases that result in many limitations on the acceptable schema. These limitations are driven by the quirks of the Parquet format rather than the definition of UknownType, or anything understandable to the end user. If instead the UknownType column was written into the Parquet file (for example, leveraging the UNKNOWN logical type in Parquet), it could be a first-class citizen. I think even allowing it as a map key can be fine within some limits. The only acceptable value for such a map would be either null (= the whole map is null) or an empty map, but the user would be free to alter the key type to any desired type down the line. On 2025/07/28 08:11:16 Bart Samwel wrote: > On Sat, Jul 26, 2025 at 6:09 PM Kevin Liu <ke...@apache.org> wrote: > > > > My initial idea was to disallow the use of UnknownType as the element > > in ListType and not allow the UnknownType as either a Key or Value of a > > MapType. Any thoughts or concerns? > > > > That makes sense. I would also include `StructType` here too. `StructType` > > is another "complex type" (extends NestedType > > < https://github.com/apache/iceberg/blob/360f87326d4ccf67512a0240e529035801d9db2b/api/src/main/java/org/apache/iceberg/types/Types.java#L1001 >) > > just like `ListType` and `MapType`. > > This will make `unknown` the first primitive type to not be allowed as > > part of another complex type. > > > > Do you mean to forbid `UnknownType` inside `StructType`? I'm afraid that > would undermine the orthogonality of the system. A common use of StructType > is to store entire rows. If StructType cannot contain elements that are > UnknownType but top-level rows can, then you can no longer store an > arbitrary top-level row inside a StructType. > > Unfortunately UnknownType in struct does have some issues. In particular, > if it's not stored, then IIUIC you can have issues with structs containing > only UnknownType fields -- they will look empty to Parquet, and my > understanding is that that isn't allowed. For orthogonality it would have > been better to actually store the unknown type, even if it's just as a > series of "this is NULL" bits. Omitting these fields in storage seems like > a convenient hack that leads to all sorts of surprising corner cases... > > > On Sat, Jul 26, 2025 at 5:43 AM Fokko Driesprong <fo...@apache.org> wrote: > > > >> Hi everyone, > >> > >> Recently I took a stab at implementing reading UknownType > >> <https://github.com/apache/iceberg/pull/13445> in the Java > >> implementation. I thought it would make sense to add this to the reference > >> implementation first. However, I ran into a limitation with the current > >> definition in the spec: > >> > >> Must be optional with null defaults; not stored in data files > >> > >> > >> One obvious limitation is that it cannot be the key of a MapType, as it > >> has to be not-null. It can't be stored either as the value of a MapType > >> since there is no easy way to store just the key without doing awkward > >> things, such as writing just the keys as a list. > >> > >> My initial idea was to disallow the use of UnknownType as the element in > >> ListType and not allow the UnknownType as either a Key or Value of a > >> MapType. Any thoughts or concerns? > >> > >> Kind regards from Belgium, > >> Fokko > >> > > >