Hi Ori,

Iceberg doesn't impose a maximum number of partitions. We have a table with
1.2 million partitions that works quite well.

Iceberg stores a tuple of partition information with each file, to identify
the partition that file belongs to. This cost is constant so it doesn't
matter if you have one file per partition or 10,000 files per partition.
That means that you're not limited by the number of partitions any more,
you're limited by the number of files you're managing.

Iceberg also keeps an index on top of the metadata it keeps for files.
Files are stored in manifests, and each table version has a manifest list
that has an index of partitions in each manifest. As long as your manifests
contain mostly separate partitions, queries tend to read only the manifests
track the partitions needed for the queries. That's why we can scale up the
number of partitions far beyond what is feasible with the Hive metastore.

However, you might find that the data files for a partition and distributed
across many different manifest files, causing your job planning to take a
long time. That can happen when your write pattern (that results in how
files are stored in manifests) doesn't match your read pattern. To fix
this, there is a tool to rewrite manifest metadata to cluster the data
files into manifests by how they will be read.

I hope that helps,

rb

On Wed, Jan 1, 2020 at 11:20 AM Ori Popowski <ori....@gmail.com> wrote:

> In Hive there's a limit on number of partitions (about 10K I think).
>
> What's the max number of partitions Iceberg supports (without severe
> degradation in performance)?
>
> Thank.
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to