szlta commented on PR #4868: URL: https://github.com/apache/iceberg/pull/4868#issuecomment-1142142111
Thanks for taking a look @rdblue. This is just a follow-up on the issue I found during implementing https://github.com/apache/iceberg/pull/4662. There I had a failing test which I could have just amended but I thought it was probably worth taking a deeper look and do some cleaning up. We currently have two separate naming conventions for partition fields which I think is not only a technical debt, but could also be found confusing. If you consider the following example: ``` PartitionSpec initialSpec = PartitionSpec.builderFor(SCHEMA).bucket("data", 8).build(); TestTables.TestTable table = TestTables.create(tableDir, "testnames", SCHEMA, initialSpec, 2); table.updateSpec().removeField(bucket("data", 8)).commit(); table.updateSpec().addField(bucket("data", 8)).commit(); Partitioning.partitionType(table); ``` We'd end up with ``` struct<1000: data_bucket: optional int, 1001: data_bucket_8: optional int> ``` I guess that means that for metadata queries one should specify different column names while they actually mean the same thing? I recall you have also found it weird in a similar case in @aokolnychyi 's example https://github.com/apache/iceberg/pull/3411#discussion_r823186537 where field names didn't match partition names. On the other hand you're right of course, as this clean-up needs change in lots of files. These are mostly test files, but I'd rather be worried about the expected API compatibility / behaviour of PartitionSpec class. But then again, maybe this is something that could be fixed before the first major version is GA? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
