rdblue commented on pull request #1612: URL: https://github.com/apache/iceberg/pull/1612#issuecomment-715592301
I'm not quite convinced that not supporting the Hive `PARTITIONED BY` clause is the right way to go, but I think it is a reasonable step to get this patch done. We don't need to support it to support the Schema DDL, so it would be fine with me to throw an exception and reject its use for now. In the long term, I think we do want Iceberg partitioning to be exposed in the normal way for Hive because it would be confusing for a partitioned Iceberg table to show up as unpartitioned. That said, there are significant differences between the two partitioning approaches: 1. Partitioning never changes the table schema, but Hive partition columns are always at the end 2. Hive partition columns can't be changed 3. Iceberg supports hidden partitions that can't be shown in Hive The differences may be significant enough that it would cause problems to expose even Iceberg identity partitions to Hive. For example, if Hive expects to get a partition key and fill in data values, then that would be a problem. What are the chances of integrating Iceberg into Hive itself and solving some of these limitations? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
