Re: V4: Block-level Pruning for Inlined Metadata (Adaptive Metadata Tree)

Amogh Jahagirdar Tue, 30 Dec 2025 10:47:15 -0800

Hey Viquar,

There shouldn't be a read regression here since the data files would have
columnar stats which would cover the ability to prune based on partitions
(since essentially all the partition transforms are derivations on a source
data column). There's been discussions in the sync on if we should keep the
partition tuple for manifests and there's nuances on writer requirements if
we were to completely rely on column stats, but regardless of if the
partition tuple is kept or not, from a pruning perspective we certainly
want to keep the same level pruning as we had before; that's a critical
property to preserve.

If we model the partition transform as an expression with its own ID, we
could then have stats on that expression.  e.g. if you have a column ts,
and partitioning days(ts), there'd be an expression <http:///> in metadata
representing days(ts), and in stats for the data file there'd be a stat
entry containing lower(days(ts)) and upper(days(ts)). For a partitioned
file, the lower and upper bounds would have to be equal. For a leaf
manifest in the root, we'd have the aggregated lower/upper stats which is
effectively the same as the partition field summary that exists today.
Then in short, a reader could just run data filters and get the same level
of pruning as before. Notice that in this modeling we avoid having to tie a
manifest to a given partition spec like what happens today.

I do think the aspect to get to more of a conclusion on is if we should
keep the partition tuple or completely rely on stats on expressions. For
reference, from a past v4 sync
<https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view?usp=sharing&t=2327>
discussion
on this topic (linked to the time the discussion start). Let me know if
that makes sense!

On Tue, Dec 30, 2025 at 10:47 AM vaquar khan <[email protected]> wrote:

> Hi everyone,
>
> I’ve been following the recent discussions and design documents regarding
> the Adaptive Metadata Tree and Single-File Commits for the V4 Spec.
>
> While moving to a Root Manifest structure solves the write amplification
> issue on S3/GCS, I am concerned about a potential regression in Partition
> Pruning efficiency for readers. Specifically, when Data Files are inlined
> into the Root Manifest, we lose the explicit partition summary bounds that
> existed in the V3 Manifest List.
>
> Without a standardized way to store lightweight partition stats for these
> inlined entries, query planners may be forced to scan significantly more
> metadata bytes to perform the same pruning we get for free today.
>
> *Proposal*: I propose we explicitly standardize a "Compact Partition
> Summary" (possibly using Bloom Filters or compressed min/max tuples) within
> the Root Manifest entry schema. This would ensure that V4 maintains the
> "File Skipping" performance of V3 while gaining the write throughput of the
> new tree structure.
>
> I am drafting a short design doc outlining the schema changes and backward
> compatibility implications for this.
>
> Before I circulate the doc, has there been any consensus on how to handle
> partition stats for inlined files in the combined Spitzer/Jahagirdar
> proposal?
>
> Regards,
> Viquar Khan
> Sr. Data Architect
> https://www.linkedin.com/in/vaquar-khan-b695577/
>

Re: V4: Block-level Pruning for Inlined Metadata (Adaptive Metadata Tree)

Reply via email to