I watched the recording. Ryan's arguments make sense (especially on where we spend the effort). I am onboard with keeping the partition tuple for now.
I also agree with Russell's point about limiting partition tuples only to equality deletes in v4 and extending the stats approach to cover non-monotonic bucketing transforms and multi-arg transforms for pruning. On Mon, May 4, 2026 at 2:51 PM Russell Spitzer <[email protected]> wrote: > As we discussed in the community sync, I recommend we keep the partition > tuple for now. It's the simplest way to maintain the guarantees needed for > equality deletes. > > Going forward, we shouldn't rely on these values for filtering (imho) and > should instead work to extend the stats struct approach to cover bucketing, > non-range-preserving, and multi-arg transforms. To this end, I would try to > make sure none of our v4 planning code interacts with the tuple directly, > except when falling back for v3-based logic. Isolating tuple access this > way means we can cleanly remove it later without reworking v4 planning > paths. > > In my ideal world we drop the tuple and equality deletes, but this seems > like the way to make progress now while leaving the door open to remove the > tuple before v4 is finalized. > > On Mon, May 4, 2026 at 10:00 AM Anoop Johnson <[email protected]> wrote: > >> Amogh, >> >> That is a good point. But the partition and stats-based evaluation paths >> are typically separate. For partition evaluation, we compare against an >> exact value, and for stats-based pruning, we look at the range of values in >> the column stats. >> >> Even if we store partition values in the content stats, it would follow >> the partition evaluation path. The new V4 manifest reader would just need >> to look at the partition value's lower_bound in the content stats instead >> of an explicit partition tuple field. The partition evaluator itself will >> be unchanged. >> >> This is conceptually no different than the current partition tuple. >> Storing it in content_stats with only lower_bound preserves the same >> semantics, but aligns with how the rest of the column stats are stored. >> >> But let's discuss the tradeoffs of the various options. Looking forward >> to the discussion in an hour. >> >> Best, >> Anoop >> >> On Sun, May 3, 2026 at 6:45 PM Amogh Jahagirdar <[email protected]> wrote: >> >>> I realized I gave a poor example of the semantic issue with removing >>> upper bound for partition outputs, but the crux is that in that >>> modeling the stats on partition outputs would be treated in a special way >>> where upper bound being null means it's partitioned rather than "unknown", >>> which is inconsistent with the other stats. >>> >>>>
