Hi Rian, This is not a valid use case per-se, but I noted this when tested different catalog implementations with Trino. In Trino, there hundreds of unit tests verifying various non-trivial use cases. What I noticed, is that some catalogs would retain "void" transforms, while the REST catalog explicitly collapses it to no-op as explained in the previous email. Not that this is big issue, just a fact that the behavior differs between catalogs. This raises a question - should the REST catalog participate in optimization of user inputs, or just store the user input as is, provided that it is formally valid? Looks like other catalogs don't care.
When working with SQL we often deal with machine-generated queries and some tests on top. Having different semantics between catalogs may cause some confusion and wasted efforts when migrating from one catalog to another, so the more catalogs are aligned, the better. The behavior in question was added in https://github.com/apache/iceberg/pull/5235. Regards, Vladimir. On Fri, Nov 1, 2024 at 1:25 AM rdb...@gmail.com <rdb...@gmail.com> wrote: > Vladimir, what is the context in which you want to maintain a partition > spec with only void transforms? Is this in a v2 table? In a v2 table, the > catalog should be free to remove void transforms. They are required for v1. > > On Wed, Oct 30, 2024 at 5:00 AM Vladimir Ozerov <voze...@querifylabs.com> > wrote: > >> Hi, >> >> When a user creates a table with void() transform on a single column, >> REST catalogs appears to ignore this, and ends up with a table with no >> partitioning information. The relevant code part is in >> RESTSessionCatalog.createChanges: >> >> PartitionSpec spec = meta.spec(); >> if (spec != null && spec.isPartitioned()) { >> changes.add(new MetadataUpdate.AddPartitionSpec(spec)); >> } else { >> changes.add(new >> MetadataUpdate.AddPartitionSpec(PartitionSpec.unpartitioned())); >> } >> >> My question is whether this is by design or not? From the user >> perspective, this appears to be ok, because the table is not partitioned >> anyway. However, some engines, such as Trino, currently retain void() >> partitioning info for non-REST catalogs. What would be the proper >> expectation from the Iceberg user in this case - should it observe void() >> in table schema or not? >> >> Regards, >> -- >> *Vladimir Ozerov* >> Founder >> querifylabs.com >> > -- *Vladimir Ozerov* Founder querifylabs.com