Fokko commented on issue #6853:
URL: https://github.com/apache/iceberg/issues/6853#issuecomment-1433701127

   > But wouldnt be a costly operation to cast the source column and as well as 
we have to explicitly let the end user know about these which dissolves the 
advantage that end user doesnt need to know anything about partitioning.
   
   Iceberg should be able to handle this for you but is currently not the case.
   
   <img width="1289" alt="image" 
src="https://user-images.githubusercontent.com/1134248/219471680-947c1055-40c7-41b0-aa6d-04c5e2afb0c7.png";>
   
   We have two rows, in two distinct partitions:
   
   <img width="1289" alt="image" 
src="https://user-images.githubusercontent.com/1134248/219471881-48554aa8-70e8-4acd-937c-d89d408e49a8.png";>
   
   When I fire up the tracing, we can see that it queries both of the files:
   
   ```
   2023-02-16T19:50:59.520 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/metadata/00002-b153fc69-e69b-489d-aff3-49ffede57be9.metadata.json
 172.18.0.3        1.45ms       ↑ 169 B ↓ 3.4 KiB
   2023-02-16T19:50:59.552 [200 OK] s3.HeadObject 
minio:9000/warehouse/wh/default/iceberg_table/metadata/snap-8884861716966779118-1-a38366f2-1636-497f-bbaf-d7a81b27d026.avro
 172.18.0.5        486µs       ↑ 133 B ↓ 0 B
   2023-02-16T19:50:59.556 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/metadata/snap-8884861716966779118-1-a38366f2-1636-497f-bbaf-d7a81b27d026.avro
 172.18.0.5        900µs       ↑ 148 B ↓ 4.2 KiB
   2023-02-16T19:50:59.565 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/metadata/a38366f2-1636-497f-bbaf-d7a81b27d026-m0.avro
 172.18.0.5        895µs       ↑ 148 B ↓ 7.0 KiB
   2023-02-16T19:50:59.570 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/metadata/a8e7390e-d67e-42bb-accf-dfe2f8df9885-m0.avro
 172.18.0.5        1.241ms      ↑ 148 B ↓ 7.0 KiB
   2023-02-16T19:50:59.643 [200 OK] s3.HeadObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet
 172.18.0.5        413µs       ↑ 133 B ↓ 0 B
   2023-02-16T19:50:59.646 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet
 172.18.0.5        662µs       ↑ 148 B ↓ 8 B
   2023-02-16T19:50:59.649 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet
 172.18.0.5        1.309ms      ↑ 148 B ↓ 1.1 KiB
   2023-02-16T19:50:59.654 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet
 172.18.0.5        1.074ms      ↑ 148 B ↓ 1.5 KiB
   2023-02-16T19:50:59.684 [200 OK] s3.HeadObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet
 172.18.0.5        552µs       ↑ 133 B ↓ 0 B
   2023-02-16T19:50:59.687 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet
 172.18.0.5        648µs       ↑ 148 B ↓ 8 B
   2023-02-16T19:50:59.690 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet
 172.18.0.5        750µs       ↑ 148 B ↓ 1.1 KiB
   2023-02-16T19:50:59.694 [206 Partial Content] s3.GetObject 
minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet
 172.18.0.5        756µs       ↑ 148 B ↓ 1.5 KiB
   ```
   
   > My question is, Iceberg creates those partition folders with exact value 
of date when we specify a date partition. In that case how hard for the 
framework to handle it gracefully rather than expecting the enduser to cast it 
on the source column?
   
   Again, this is not up to Iceberg, but up to Spark/Trino/etc on how to do the 
comparison. See below where the behavior is the same against a plain Spark 
table. If you want to change this behavior, you should discuss this in the 
Trino/Spark community.
   
   <img width="1289" alt="image" 
src="https://user-images.githubusercontent.com/1134248/219484147-20a06e45-3787-41f8-882a-8f1278192ecf.png";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to