Re: Changing default delete file granularity for Spark writes from partition to file scoped

Anton Okolnychyi Tue, 12 Nov 2024 11:37:10 -0800

I support the idea of switching to file-scoped deletes for new tables. The
absence of sync maintenance prevented us from doing that earlier. Given
that Amogh recently merged that functionality into main, we should be fine.
Flink connectors have been using file-scoped deletes for a while now to
solve issues with concurrent compactions.


File-scoped deletes are closer to DVs and even share some code like
assignment by path and sync maintenance in Spark. This migration will allow
us to test many concepts that are needed for DVs prior to completing the V3
spec. It is also easier to migrate file-scoped deletes to DVs in V3. Unlike
partition-scoped deletes, they can be discarded from the table state once a
DV is produced for that data file.

- Anton



пн, 11 лист. 2024 р. о 21:01 Russell Spitzer <russell.spit...@gmail.com>
пише:

> I don't think this is a bad idea from a theoretical perspective. Do we
> have any actual numbers to back up the change? I would think for most folks
> we would recommend just going to V3 rather than changing granularity for
> their new tables. It would just affect new tables though so I'm not opposed
> to the change except for on the grounds that we are changing a default.
>
> On Mon, Nov 11, 2024 at 1:56 PM Amogh Jahagirdar <2am...@gmail.com> wrote:
>
>> Hi all,
>>
>> I wanted to discuss changing the default position delete file granularity
>> for Spark from partition to file level for any newly created V2 tables. See
>> this PR [1]
>>
>> Context on delete file granularity:
>>
>>    - Partition granularity: Writers group delete files for multiple data
>>    files from the same partition into the same delete file. This leads to
>>    fewer files on disk, but higher read amplification from reading delete
>>    information from irrelevant data files for a scan.
>>    - File granularity: Writers write a new delete file for every changed
>>    data file. More targeted reads of relevant delete information occur but
>>    this can lead to more files on disk.
>>
>> With the recent merge of synchronous position delete maintenance on write
>> in Spark [2], file granularity as a default is more compelling since reads
>> would be more targeted *and* files would be maintained on disk. I also
>> recommend folks go through the deletion vector design doc for more details
>> [3].
>>
>> Note that for existing tables with high delete-to-data file ratios,
>> Iceberg's rewrite position deletes procedure can compact the table and
>> every subsequent write would continuously maintain the position deletes.
>> Additionally note that in V3, at most one puffin position delete file is
>> allowed per data file; what's being discussed here is changing the default
>> granularity for new V2 tables since it should generally be better after the
>> sync maintenance addition.
>>
>> What are folks' thoughts on this?
>>
>> [1] https://github.com/apache/iceberg/pull/11478
>> [2] https://github.com/apache/iceberg/pull/11273
>> [3]
>> https://docs.google.com/document/d/18Bqhr-vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM/edit?tab=t.0#heading=h.193fl7s89tcg
>>
>> Thanks,
>>
>> Amogh Jahagirdar
>>
>

Re: Changing default delete file granularity for Spark writes from partition to file scoped

Reply via email to