Thanks Vivekanand!

I made some comments on the doc. Overall, I think a partition index is a
good idea. We've thought about adding sketches that contain skew estimates
for certain columns in a partition so that we can do better join
estimation. Getting a start on how we would store data like this is a good
step.

I'm a bit more skeptical about locality information, since it would get out
of date and require rewriting old, large manifests.

On Fri, Nov 20, 2020 at 1:44 AM Vivekanand Vellanki <[email protected]>
wrote:

> Hi,
>
> I would like to propose additional fields in Iceberg manifest files
> <https://docs.google.com/document/d/1G6GeOXkGSiSTcu0lDS6VA1FtJ_uz9FO4tF2Pffmx9LU/edit#>
> to support the following scenarios:
>
>    - Partition index to include per-partition stats to help support
>    planning
>    - Data locality information to support split assignment in distributed
>    query engines
>
> Comments are welcome.
>
> --
> Thanks
> Vivek
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to