SamWheating opened a new issue, #18858:
URL: https://github.com/apache/druid/issues/18858
If an iceberg table using merge-on-read updates or deletes is ingested into
druid, then the deleted rows will be ingested as well.
As a simple example, we can create a quick Iceberg table using Spark:
```scala
val df = Seq(
("store_a", 1, 100),
("store_a", 2, 200),
("store_b", 3, 300),
("store_b", 4, 400),
).toDF("store_id", "item_count", "price_total")
df.withColumn("ts", current_timestamp()).
writeTo("demo.test_database.checkouts").
using("iceberg").
partitionedBy(hours($"ts")).
tableProperty("write.update.mode", "merge-on-read").
create()
```
Then update the table:
```sql
UPDATE demo.test_database.checkouts SET total_price=0 WHERE store_id =
'store_a'
```
Ingesting the table into druid then shows 6 rows, due to ingesting both
versions of the updated records:
```sql
SELECT * FROM "checkouts"
{"__time":"2025-12-19T00:00:00.000Z","store_id":"store_a","count":4,"sum_item_count":6,"sum_price_total":300}
{"__time":"2025-12-19T00:00:00.000Z","store_id":"store_b","count":2,"sum_item_count":7,"sum_price_total":700}
```
This feels like a potential hazard which isn't explicitly called out in [the
documentation](https://druid.apache.org/docs/latest/development/extensions-contrib/iceberg/).
Ideally we would handle the delete markers and properly materialize the
data, but thats a pretty big overhaul. As a shorter-term solution should we
maybe just fail the ingestion if there's delete markers present in the target
partitions?
Happy to help with the implementation here, or at least just updating the
documentation to make this more clear - let me know what you think is the best
path forwards.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]