I have no problem with explicitly stating that writing identity source
columns is optional on write. We should, of course, mandate surfacing the
column on read :)

On Thu, Jul 25, 2024 at 1:30 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> The Table specification doesn't mention anything about requirements for
> whether writing identity partitioned columns is necessary.  Empirically, it
> appears that implementations always write the column data at least for
> parquet.  For columnar formats, this is relatively cheap as it is trivially
> RLE encodable.  For Avro though it comes at a little bit of a higher cost.
> Since the data is fully reproducible from Iceberg metadata, I think stating
> this as optional in the specification would be useful.
>
> For reading identity partitioned from Iceberg tables, I think the
> specification needs to require that identity partition column values are
> read from metadata.  This is due to the fact that Iceberg supports
> migrating Hive data (and other table formats) without data rewrites that
> don't typically write their partition information directly into files.
>
> Thoughts?
>
> When we get consensus I'll open up a PR to clarify these points.
>
> Thanks,
> Micah
>

Reply via email to