Hi Ori, Iceberg doesn't impose a maximum number of partitions. We have a table with 1.2 million partitions that works quite well.
Iceberg stores a tuple of partition information with each file, to identify the partition that file belongs to. This cost is constant so it doesn't matter if you have one file per partition or 10,000 files per partition. That means that you're not limited by the number of partitions any more, you're limited by the number of files you're managing. Iceberg also keeps an index on top of the metadata it keeps for files. Files are stored in manifests, and each table version has a manifest list that has an index of partitions in each manifest. As long as your manifests contain mostly separate partitions, queries tend to read only the manifests track the partitions needed for the queries. That's why we can scale up the number of partitions far beyond what is feasible with the Hive metastore. However, you might find that the data files for a partition and distributed across many different manifest files, causing your job planning to take a long time. That can happen when your write pattern (that results in how files are stored in manifests) doesn't match your read pattern. To fix this, there is a tool to rewrite manifest metadata to cluster the data files into manifests by how they will be read. I hope that helps, rb On Wed, Jan 1, 2020 at 11:20 AM Ori Popowski <ori....@gmail.com> wrote: > In Hive there's a limit on number of partitions (about 10K I think). > > What's the max number of partitions Iceberg supports (without severe > degradation in performance)? > > Thank. > -- Ryan Blue Software Engineer Netflix