Fokko commented on issue #6853: URL: https://github.com/apache/iceberg/issues/6853#issuecomment-1433701127
> But wouldnt be a costly operation to cast the source column and as well as we have to explicitly let the end user know about these which dissolves the advantage that end user doesnt need to know anything about partitioning. Iceberg should be able to handle this for you but is currently not the case. <img width="1289" alt="image" src="https://user-images.githubusercontent.com/1134248/219471680-947c1055-40c7-41b0-aa6d-04c5e2afb0c7.png"> We have two rows, in two distinct partitions: <img width="1289" alt="image" src="https://user-images.githubusercontent.com/1134248/219471881-48554aa8-70e8-4acd-937c-d89d408e49a8.png"> When I fire up the tracing, we can see that it queries both of the files: ``` 2023-02-16T19:50:59.520 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/metadata/00002-b153fc69-e69b-489d-aff3-49ffede57be9.metadata.json 172.18.0.3 1.45ms ↑ 169 B ↓ 3.4 KiB 2023-02-16T19:50:59.552 [200 OK] s3.HeadObject minio:9000/warehouse/wh/default/iceberg_table/metadata/snap-8884861716966779118-1-a38366f2-1636-497f-bbaf-d7a81b27d026.avro 172.18.0.5 486µs ↑ 133 B ↓ 0 B 2023-02-16T19:50:59.556 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/metadata/snap-8884861716966779118-1-a38366f2-1636-497f-bbaf-d7a81b27d026.avro 172.18.0.5 900µs ↑ 148 B ↓ 4.2 KiB 2023-02-16T19:50:59.565 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/metadata/a38366f2-1636-497f-bbaf-d7a81b27d026-m0.avro 172.18.0.5 895µs ↑ 148 B ↓ 7.0 KiB 2023-02-16T19:50:59.570 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/metadata/a8e7390e-d67e-42bb-accf-dfe2f8df9885-m0.avro 172.18.0.5 1.241ms ↑ 148 B ↓ 7.0 KiB 2023-02-16T19:50:59.643 [200 OK] s3.HeadObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet 172.18.0.5 413µs ↑ 133 B ↓ 0 B 2023-02-16T19:50:59.646 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet 172.18.0.5 662µs ↑ 148 B ↓ 8 B 2023-02-16T19:50:59.649 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet 172.18.0.5 1.309ms ↑ 148 B ↓ 1.1 KiB 2023-02-16T19:50:59.654 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-13-13/00000-0-f46a696b-d858-49cd-bb18-c4d39b3578ab-00001.parquet 172.18.0.5 1.074ms ↑ 148 B ↓ 1.5 KiB 2023-02-16T19:50:59.684 [200 OK] s3.HeadObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet 172.18.0.5 552µs ↑ 133 B ↓ 0 B 2023-02-16T19:50:59.687 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet 172.18.0.5 648µs ↑ 148 B ↓ 8 B 2023-02-16T19:50:59.690 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet 172.18.0.5 750µs ↑ 148 B ↓ 1.1 KiB 2023-02-16T19:50:59.694 [206 Partial Content] s3.GetObject minio:9000/warehouse/wh/default/iceberg_table/data/trans_ts_hour%3D2019-06-14-13/00000-1-07373866-4c83-4e5a-8577-e9aa24acbfc4-00001.parquet 172.18.0.5 756µs ↑ 148 B ↓ 1.5 KiB ``` > My question is, Iceberg creates those partition folders with exact value of date when we specify a date partition. In that case how hard for the framework to handle it gracefully rather than expecting the enduser to cast it on the source column? Again, this is not up to Iceberg, but up to Spark/Trino/etc on how to do the comparison. See below where the behavior is the same against a plain Spark table. If you want to change this behavior, you should discuss this in the Trino/Spark community. <img width="1289" alt="image" src="https://user-images.githubusercontent.com/1134248/219484147-20a06e45-3787-41f8-882a-8f1278192ecf.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
