Re: Spark Merge On Read Support

Puneet Zaroo Wed, 17 Nov 2021 22:56:24 -0800

Perhaps a newbie question, but if the requirement is to just read v2 tables
with equality and/or position delete files, does that also require Spark
3.2 or is that supported in Spark 2.4 as well (even if in a sub-optimal
way).


Thanks,
- Puneet


On Wed, Nov 17, 2021 at 10:07 AM Ryan Blue <[email protected]> wrote:

> The plan is to support it in 3.2. I think that we're very close but Anton
> is the expert there.
>
> On Tue, Nov 16, 2021 at 6:22 AM Sreeram Garlapati <[email protected]>
> wrote:
>
>> This makes sense, thanks a lot @Ryan Blue <[email protected]>.
>>
>> Are all building blocks for MOR support (features like - delta-based
>> plans) fully available in Spark 3.2 - or is there any reason we would need
>> Spark 3.3? Or is there more ongoing work needed to fully validate this? I
>> am in need of this specific data point *about the Spark version* - to
>> move our organization into the correct Spark version. Truly appreciate your
>> help.
>>
>> Best regards,
>> Sreeram
>>
>> On Mon, Nov 15, 2021 at 4:37 PM Ryan Blue <[email protected]> wrote:
>>
>>> Sreeram,
>>>
>>> The project tracking this is here:
>>> https://github.com/apache/iceberg/projects/11
>>>
>>> It isn’t easy to get a good picture, since most of the PRs are merged.
>>> But Anton is working on the next set of PRs for Spark. Maybe Anton can find
>>> some time to add a few notes about what's left to be done.
>>>
>>> What’s been done so far is pretty significant:
>>>
>>>    - Add new writers that can handle deletes across multiple partition
>>>    specs
>>>    - Add Spark 3.2 module and refactor Spark builds
>>>    - Add metadata columns to Spark 3.2
>>>    - Add support for required distribution and ordering in Spark 3.2
>>>    - Support Spark 3.2 dynamic filtering
>>>
>>> Many of those are the building blocks for the delta-based plans. And
>>> it’s really amazing to finally have support for some major improvements:
>>> dynamic filtering on all queries, metadata columns, and required
>>> distribution and ordering!
>>>
>>> Ryan
>>>
>>> On Thu, Nov 11, 2021 at 11:46 PM Sreeram Garlapati <
>>> [email protected]> wrote:
>>>
>>>> Hello Iceberg devs!
>>>>
>>>> After going through the mail threads (especially "Spark version support
>>>> strategy") and relevant PRs - it looks like - *Merge on Read* Support
>>>> (ie., Spark writers writing equality deletes) will be available with
>>>> *Iceberg **+ Spark 3.2*. Is this understanding correct!? Or is this
>>>> something that will be available only with Iceberg on Spark 3.3!?
>>>>
>>>> Would really appreciate it if someone can point me to any place - which
>>>> tracks - the remaining work.
>>>>
>>>> Thanks,
>>>> Sreeram
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Re: Spark Merge On Read Support

Reply via email to