Also, wanted to add another observation that is on the flip side of the
initial argument. Some data formats like Avro do not support TINYINT
either. So even if Trino uses TINYINT in the struct schema, when the table
is written back to Avro, TINYINT will not be used. This supports the side
of the argument that Trino should use INT in
https://github.com/trinodb/trino/pull/3483 to minimize incompatibility
issues. Thoughts?

On Thu, Dec 2, 2021 at 5:28 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Hi all,
>
> I wanted to bring up the topic of supporting TINYINT and SMALLINT data
> types in Iceberg to the dev mailing list. We have started some discussion
> on the ASF Iceberg channel [1], and agreed to bring more visibility to the
> topic here.
>
> This came up while we were implementing the equivalent of Trino's union to
> struct conversion [2] in our internal fork of Iceberg, but we are facing
> the issue that the output struct schema contains a field that is a TINYINT.
> I am starting this thread to get people's opinion on adding TINYINT and
> SMALLINT types to Iceberg. The reasons for the change are:
>
> * TINYINT (and SMALLINT) are supported by the Hive table format, and
> consequently by engines like Trino and Spark. So adding them to Iceberg can
> help with Iceberg adoption, and makes it easy for people to convert Hive
> tables to Iceberg tables smoothly.
>
> * We have observed some regressions when trying to convert a table with
> TINYINT to INT, especially with views that are defined on the TINYINT table
> (see the Slack thread for details).
>
> * We want to converge the community to one form of transforming union
> types to structs. Currently, the Trino Hive connector uses TINYINT in the
> output schema, so we want to lock down a type there. Hopefully, that
> becomes a standard across engines, and IRs. We are doing the same
> conversion in Coral [3], but hoping to wrap our discussion here before
> proceeding.
>
> * The more systems not agreeing on this conversion (to TINYINT or INT),
> the more challenging it will be to fix this problem. Sometimes the schema
> leaks transitively across tables and it becomes more difficult to retract,
> so now is the best time to decide on this.
>
> That said, first, I am hoping to get an alignment on supporting TINYINT
> and SMALLINT in Iceberg since this is a more fundamental issue, and has its
> own justications. If we agree we can support TINYINT, the union type
> standardization discussion would be simpler since we will not expect Trino
> to change.
>
> Thanks,
> Walaa.
>
> [1] https://the-asf.slack.com/archives/CF01LKV9S/p1637642089008400
> [2] https://github.com/trinodb/trino/pull/3483
> [3] https://github.com/linkedin/coral/pull/192
>

Reply via email to