Iceberg has 2 main types of partitions: * identity, named for the identity transform, which are just like Hive partitions where data from some column is used to partition without modification (identity transform) * hidden, which derive partition values from some column but don't expose those values as a data column
An example of a "hidden" partition is a day transform, where the partition value (day) is derived from a timestamp or date value. The data contains only a timestamp column and partition values are automatically derived from those timestamps to store the data. Then filters on the timestamp are automatically converted to partition filters when planning a scan. Hidden partitions have lots of benefits: * Users aren't responsible for calculating partition values when writing, which is error prone (i.e., date in what time zone?) * Readers aren't responsible for filtering partition values in addition to data values, which is a common cause of bad queries and full table scans * Queries are written for data and can be run on any partition scheme, so partitioning can be evolved over time as data volume changes To answer your question, "will filtering on a timestamp column be converted into correct partition predicates when data is partitioned by year/month?" Yes, if the year or month partition is a hidden partition that is maintained by Iceberg because Iceberg knows the relationship between a timestamp/date and the partition. No if the year/month partition was supplied by a user as a data column because Iceberg doesn't know anything about the relationship between two columns. rb On Tue, Nov 27, 2018 at 11:54 AM Erik Wright <erik.wri...@shopify.com> wrote: > *Presto Integration* >>> >>> If I understand correctly, there is no current support for reading >>> Iceberg data directly from Presto. I imagine, however, that as long as your >>> partition specifications are restricted to the `identity` transform it >>> should be "easy" to consume an Iceberg table snapshot and register it as an >>> external hive table complete with partition metadata. Am I missing any >>> caveats there? >>> >> >> Presto support has been posted in a PR: >> https://github.com/prestodb/presto/pull/11767 >> >> Presto performance should be on par with Parquet tables in Presto because >> we use the same vectorized read code. The implementation supports reading >> from partitioned tables with either hidden or identity partitions. >> > > Can you clarify what is meant by "hidden" partitions? Will Presto > integration be able to convert filter predicates to partition predicates > for the partition transforms listed in the Iceberg table spec? For example, > will filtering on a timestamp column be converted into correct partition > predicates when data is partitioned by year/month? > -- Ryan Blue Software Engineer Netflix