Hi ZC C, Adding partitions to Iceberg tables is easy, and changing them, later on, is easy as well. The existing data will continue to exist with the partition that it was initially written with, new data will be written according to the active partitioning. When you rewrite the data (for example using the Spark procedure rewrite_data_files <https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_data_files>) it will use the new partitioning strategy. Getting back to your question; the harm is that you need to rewrite the data to benefit from the new partitioning strategy.
Choosing the right partitioning strategy is something that often evolves over time. You don't want to be too granular because that will create a lot of files, which will cause more IO overhead. But also not too coarse since that will read in a lot of data that you're not interested in. It helps to look at your queries and see which fields are filtered on, and which are your candidate partitions (one or a combination of). Hope this helps, Kind regards, Fokko Op ma 24 apr 2023 om 19:49 schreef ZC C <[email protected]>: > We now are create a row data table, and my colleague want to add org_id as > the partition, What is the harm of adding partition to iceberg table?
