wombatu-kun commented on issue #16776:
URL: https://github.com/apache/iceberg/issues/16776#issuecomment-4686883917

   Thanks for the report. To narrow this down, a few questions - the Kafka 
Connect sink commits on a fixed interval and, on recent versions, the 
coordinator *skips* the commit entirely when an interval produced no data and 
no delete files (it logs `found nothing to commit to table <name>, skipping`). 
So new metadata files appearing for a table usually means *something* is being 
committed each interval. The details below would help pin down what.
   
   **Versions / catalog**
   
   1. Which Iceberg version is the `iceberg-kafka-connect` connector? (and 
Kafka Connect runtime version)
   2. Glue catalog config - can you share the `iceberg.catalog.*` properties 
(redacted)?
   
   **What is actually in the new snapshots** (most useful)
   
   3. For one affected table, what do the new snapshots contain? Easiest is to 
query the metadata tables, e.g. `SELECT committed_at, operation, summary FROM 
<db>.<table>.snapshots ORDER BY committed_at DESC LIMIT 10`. Specifically: is 
`operation` = `append`, `delete`, or `overwrite`, and what are 
`added-data-files`, `added-records`, `added-delete-files` in the summary?
   4. Do the affected tables have id-columns configured (upsert/CDC), e.g. 
`iceberg.tables.default-id-columns` or a per-table 
`iceberg.table.<name>.id-columns`? A stream that is mostly updates/deletes 
produces delete files and new snapshots but few or no new data files.
   
   **Routing**
   
   5. Are you using dynamic routing (`iceberg.tables.dynamic-enabled=true`) or 
a static `iceberg.tables` list, and what is `iceberg.tables.route-field` / any 
per-table `route-regex`?
   6. With dynamic routing the route-field value is lower-cased and used 
directly as the table name. Could records for the affected tables be routing to 
a differently-named or differently-namespaced Glue table than the one you are 
inspecting? Worth checking whether a similarly-named table is receiving the 
data files.
   
   **Records / config / logs**
   
   7. A sample record (key + value shape) for one affected table - in 
particular, is the route field always present, and are any records tombstones 
(null value)? Null-value records are skipped by the writer.
   8. The full connector config (redacted), especially the `iceberg.tables*` 
and `iceberg.control.commit.*` settings.
   9. From the connect worker logs at INFO, for an affected vs a working table, 
do you see `completed commit to table <name>, snapshot ...` or `found nothing 
to commit to table <name>, skipping`?
   
   **Workload**
   
   10. Roughly the record rate to an affected table, and whether the traffic is 
mostly inserts vs updates/deletes.
   
   With the snapshot summaries (Q3) and the routing config (Q5/Q6) we should be 
able to tell whether this is delete-only upsert commits, schema/metadata-only 
commits, or data being written to a different table than expected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to