SreeramGarlapati opened a new issue #2723:
URL: https://github.com/apache/iceberg/issues/2723


   Am trying to understand how to configure an iceberg table for `low-latency` 
writes. Here are my thoughts on how to configure it. Truly appreciate any 
inputs.
   
   ### Supporting low latency writes to iceberg table entails the below 
sub-problems:
   
   1. **Optimizing the data payload:** optimizing the data payload to be 
written to the table.
       - **There is an Optimization that is specific to the write pattern**
           1. **Appends:** in case of Appends - this is already solved - as 
iceberg always writes the new new inserts as new files.
           2. **Deletes (/Upserts):** in case of Deletes (or Upserts - which 
are broken down into Insert + Delete in 2.0) - this problem is solved as well.
       - **File Format**: There is another optimization knob useful at file 
format level. It might not make sense to generate data in the columnar format 
here - al the time and space spent for encoding, storing stats etc can be 
saved. So, thankfully, for these low-latency writes with iceberg table - `AVRO` 
file format can be used.
   2. **Locating the records that need to be updated with low-latency:** In 
case of Upserts - locating the records that need to be updated is the key 
problem to be solved.
       - One popular solution for this is to maintain indexes to support the 
**equality filters** used for upserts. **Do you know if there is any ongoing 
effort for this!?**
   3. **Optimizing the Metadata payload**: for every write to Iceberg table - 
the **schema file** & **manifest list** file are rewritten. To further push the 
payload down - we can potentially write the "change set" here. Is this the 
current direction of thought? If so, pointers to any any work stream in this 
regard is truly appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to