Andreas,

We haven't built a strategy in the Flink sink that will use position
deletes. The difficulty is that position deletes require knowing where the
row that you want to delete is located, which means you either have to have
expensive row-level indexing or you need to scan through potential data
files to locate the rows to delete. Instead, the approach we've taken in
Flink so far is to write out equality deletes that don't require knowing
where the affected rows are located. Then, you can compact those deletes in
the background to make access more efficient.

Another alternative is to use Spark, which has MERGE, UPDATE, and DELETE
plans. Those plans already need to find the affected rows, so there are
plans that use position deletes (as well as plans that eagerly rewrite, or
use a "copy-on-write" strategy). You can use those plans in microbatch to
produce the results you're looking for. If you want to use position
deletes, I'd recommend testing this out first to ensure that you get the
performance you're looking for. It might be that fast deletes in Flink with
an aggressive background compaction policy to apply them is better in the
long term.

Ryan

On Mon, Jun 6, 2022 at 5:54 AM Hailu, Andreas <andreas.ha...@gs.com> wrote:

> Hi folks, I’m processing data from an Iceberg table with Flink and had a
> question about positional deletes.
>
>
>
> I batch process a source Table to create DataStream of Records that I’d
> like to delete from it. I initially created equality delete files with all
> the values from these Records, but for performance purposes I’d like to try
> out positional deletes. Given a Record from a Table, how can I go about
> identifying its position? A fellow from the Slack mentioned in Spark
> there’s a “_pos” metadata field but I haven’t found the equivalent in Java
> or Flink.
>
>
>
> best,
>
> ah
>
>
>
> ------------------------------
>
> Your Personal Data: We may collect and process information about you that
> may be subject to data protection laws. For more information about how we
> use and disclose your personal data, how we protect your information, our
> legal basis to use your information, your rights and who you can contact,
> please refer to: www.gs.com/privacy-notices
>


-- 
Ryan Blue
Tabular

Reply via email to