[GitHub] [iceberg] ddcprg commented on issue #7192: OPTIMIZE operation prunes data not visible to the table

via GitHub Fri, 24 Mar 2023 10:59:09 -0700


ddcprg commented on issue #7192:
URL: https://github.com/apache/iceberg/issues/7192#issuecomment-1483202805


   I work for a company that builds software for financial institutions. We get 
data on Kafka and we deliver it to the data lake. We have built this connector 
to sink the data to iceberg tables, specifically AWS Athena in parquet format 
   
   https://github.com/10xfuturetechnologies/kafka-connect-iceberg
   
   We sink the data to the blob store keeping the record schema for audibility 
and we create iceberg tables exposing a subset of the data. However now and 
then we have to expose additional columns in the table retroactively, i.e. data 
from records older than the date when the column was added also need to show up 
in the table. Backfilling would be a very expensive option for us. We usually 
add columns, so far we haven't had to drop columns from te table but this could 
be the case in future requirements


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ddcprg commented on issue #7192: OPTIMIZE operation prunes data not visible to the table

Reply via email to