Hello Iceberg devs!

Did any of you solve "low latency writes to Iceberg"? Overall, it boiled
down to 2 questions:
1. Is there a way to add indexes to Iceberg table - to support equality
based filters (pl. see the point #2 below for more explanation)
2. Is there a workstream to support writing delta's of metadata changes
(pl. see point #3 below)?

Best,
Sreeram

https://github.com/apache/iceberg/issues/2723

*Truly appreciate any inputs.*
Supporting low latency writes to iceberg table entails the below
sub-problems:

   1. Optimizing the data payload: optimizing the data payload to be
   written to the table.
      - There is an Optimization that is specific to the write pattern
         1. Appends: in case of Appends - this is already solved - as
         iceberg always writes the new new inserts as new files.
         2. Deletes (/Upserts): in case of Deletes (or Upserts - which are
         broken down into Insert + Delete in 2.0) - this problem is
solved as well.
      - File Format: There is another optimization knob useful at file
      format level. It might not make sense to generate data in the columnar
      format here - al the time and space spent for encoding, storing stats etc
      (assuming the writes are small number of rows for ex: < 5) can be saved.
      So, thankfully, for these low-latency writes with iceberg table
- AVRO file
      format can be used.
   2. Locating the records that need to be updated with low-latency: In
   case of Upserts - locating the records that need to be updated is the key
   problem to be solved.
      - One popular solution for this is to maintain indexes to
support the equality
      filters used for upserts. Do you know if there is any ongoing effort
      for this!?
   3. Optimizing the Metadata payload: for every write to Iceberg table -
   the schema file & manifest list file are rewritten. To further push the
   payload down - we can potentially write the "change set" here. Is this the
   current direction of thought? *If so, pointers to any any work stream in
   this regard is truly appreciated.*

Reply via email to