JerAguilon opened a new issue, #14914:
URL: https://github.com/apache/iceberg/issues/14914

   ### Query engine
   
   Trino and ClickHouse, though this question is more about the spec itself
   
   ### Question
   
   I've been reading [this 
document](https://apache.github.io/iceberg/docs/1.4.3/partitioning/#icebergs-hidden-partitioning)
 and I am unsure what to conclude for this particular use case, and I am seeing 
divergent behavior across some table engines.
   
   Suppose I have a table with a schema of:
   
   ```
   sale_id: BIGINT
   price: BIGINT
   date: DATE
   ```
   
   And it is partitioned on `date`.
   
   If we guarantee that each Parquet data file contains data for one and only 
one `date`, are we allowed to just have `sale_id, price` in the physical files, 
while marking `date` values in the manifest layer?
   
   Given that there's a "separation between physical and logical," this seems 
to be analogous to a transformation like `bucket`, except with some constant 
value. But the spec doesn't explicitly discuss this.
   
   I have a table with this format. On Trino's implementation, it seems to 
handle this gracefully, substituting partition values as you'd except. However, 
some other engines (https://github.com/ClickHouse/ClickHouse/issues/92732) 
instead return `NULL` values.
   
   It'd be very helpful to explicitly clarify if `identity` partitions can also 
be hidden, in the same way that `bucket` transformations are.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to