Re: [DISCUSS] SPIP: DSV2 Enhanced Partition Stats Filtering

Szehon Ho Tue, 24 Feb 2026 15:03:14 -0800

Thanks all, I also made a POC pr (for the proposed first phase), if it
helps readers understand more:  https://github.com/apache/spark/pull/54459


Thanks,
Szehon

On Sat, Feb 21, 2026 at 7:04 PM huaxin gao <[email protected]> wrote:

> +1 from me. This is a great direction to close the DSv2 partition pruning
> gap by reusing Spark’s existing Catalyst partition-filter logic. Looking
> forward to the implementation.
>
> Huaxin
>
> On Thu, Feb 19, 2026 at 7:58 PM Tathagata Das <[email protected]>
> wrote:
>
>> Massive +1 from me.
>> Delta is starting to transition to DSv2 as well, and this solves a major
>> gap we were concerned about.
>> THANK YOU.
>>
>> On Thu, Feb 19, 2026 at 8:08 PM Anton Okolnychyi <[email protected]>
>> wrote:
>>
>>> Thanks, Szehon!
>>>
>>> This will help address one of the long-standing limitations in DSv2 that
>>> is a common cause of regressions or even blockers for DSv2 adoption. I am
>>> looking forward to implementation.
>>>
>>> - Anton
>>>
>>> ср, 18 лют. 2026 р. о 14:30 Szehon Ho <[email protected]> пише:
>>>
>>>> Hi all,
>>>>
>>>> I would like to propose enhancements for partition filter pushdown, for
>>>> DSV2 data sources that support partitioning (ie, those with partition
>>>> stats).
>>>>
>>>> Some DSV2 data sources, for example table formats like Apache Iceberg,
>>>> lack partition filtering in many queries, compared to Spark-native data
>>>> sources that directly use Catalyst (like Parquet).  This proposal can
>>>> bridge that gap while simplifying the data source logic.
>>>>
>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55596
>>>> SPIP doc:
>>>> https://docs.google.com/document/d/17vcw411PxSRLWoK-BiLI56UiNdokLWtovF8JZUlDTOo
>>>>
>>>> Look forward to comments and feedback.
>>>>
>>>> Thanks,
>>>> Szehon
>>>>
>>>

Re: [DISCUSS] SPIP: DSV2 Enhanced Partition Stats Filtering

Reply via email to