trolle4 opened a new issue, #11796:
URL: https://github.com/apache/iceberg/issues/11796
### Apache Iceberg version
1.7.1 (latest release)
### Query engine
Kafka Connect
### Please describe the bug 🐞
We experience unexplainable large delays on the commits from time to time.
The setup is using the kafka-connect connector. Tested both with the latest
release (1.7.1) and with main. (1.8.0-SNAPSHOT)
These are the important settings from the connector config:
```json
"connector.class": "org.apache.iceberg.connect.IcebergSinkConnector",
"tasks.max": "1",
"topics": "iceberg_1000300255_long",
"iceberg.control.commit.interval-ms": "10000",
"iceberg.control.commit.timeout-ms": "20000",
```
So we commit every 10 seconds and timeout after 20 seconds. This work well
approximately 75% of the time. But occasionally (every 4-5 commits or so) we
experience a delay that cannot be explained.
When this happens, a message is received. The connector goes through the
commit logic and times out as if there is no message received. This happens a
couple of times until it is finally committed (no other messages are received
in this time period.
Here's an example:

These messages are from the input topic (The timestamp is in GMT+1)
This is the kafka offset and consumer offset for the source topic, showing
the delay in offset commit (matches the iceberg commit)

This is the same for the iceberg connector control topic, which does not
have any delays (so that's not the issue)

At 07:40:15 two messages are received. One of them is committed immediately
after, but the other one is not. No more messages are received and the commit
loop goes through several cyles (4 commits and timeouts), before it is actually
committed at 07:41:44.
The logs can be seen below, as well as a filtered part of the log, which
only shows when a commit interval starts, times out and completes. Log files
are in GMT
Any idea what could be causing this? Another thing I notice, is that when no
messages have been received since the last commit, it still has to wait for the
timeout. This does not seem like the most feasible solution. Wouldn't it be
better to track that and then simply return?
Filtered log:
```
Event Analysis:
Timestamp Event Type Since Last Event Since
Same Event
------------------------------------------------------------------------------------------
2024-12-16 06:40:07,056 initiated 0.03s 20.04s
2024-12-16 06:40:16,186 committed_to_1 9.13s 32.18s
2024-12-16 06:40:17,194 initiated 1.01s 10.14s
2024-12-16 06:40:37,202 timeout 20.01s 30.18s
2024-12-16 06:40:37,233 initiated 0.03s 20.04s
2024-12-16 06:40:57,243 timeout 20.01s 20.04s
2024-12-16 06:40:57,274 initiated 0.03s 20.04s
2024-12-16 06:41:17,281 timeout 20.01s 20.04s
2024-12-16 06:41:17,314 initiated 0.03s 20.04s
2024-12-16 06:41:37,323 timeout 20.01s 20.04s
2024-12-16 06:41:37,375 initiated 0.05s 20.06s
2024-12-16 06:41:44,282 committed_to_1 6.91s 88.10s
```
Complete log:
```log
2024-12-16 06:40:07,031 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:07,056 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:15,146 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,378 INFO Table loaded by catalog:
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay
[org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,390 INFO Saving 1 records. Last offset 151629
[org.apache.iceberg.connect.channel.CommitterImpl]
2024-12-16 06:40:15,429 INFO Sending event of type: DATA_WRITTEN
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,429 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,438 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,438 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,440 INFO Handled event of type: DATA_WRITTEN
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,440 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85
not ready, received responses for 0 of 1 partitions, waiting for more
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:15,440 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,463 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,464 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85
ready, received responses for all 1 partitions
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:15,468 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,473 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,614 INFO Table loaded by catalog:
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay
[org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,650 INFO Table loaded by catalog:
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay
[org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,661 INFO Saving 1 records. Last offset 151630
[org.apache.iceberg.connect.channel.CommitterImpl]
2024-12-16 06:40:15,948 INFO Committed to table
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay with the
new metadata location
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json
[org.apache.iceberg.hive.HiveTableOperations]
2024-12-16 06:40:15,948 INFO Successfully committed to table
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay in 206 ms
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,948 INFO Committed snapshot 409307506258743637
(MergeAppend) [org.apache.iceberg.SnapshotProducer]
2024-12-16 06:40:15,976 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:16,147 INFO Received metrics report:
CommitReport{tableName=iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay,
snapshotId=409307506258743637, sequenceNumber=8740, operation=append,
commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.357961945S, count=1}, attempts=CounterResult{unit=COUNT,
value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1},
removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=8763},
addedDeleteFiles=null, addedEqualityDeleteFiles=null,
addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null,
removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0},
addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null,
totalRecords=CounterResult{unit=COUNT, value=80408},
addedFilesSizeInBytes=CounterResult{unit=BYTES, value=17446},
removedFilesSizeInBytes=null, t
otalFilesSizeInBytes=CounterResult{unit=BYTES, value=625872320},
addedPositionalDeletes=null, removedPositionalDeletes=null,
totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
addedEqualityDeletes=null, removedEqualityDeletes=null,
totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
metadata={iceberg-version=Apache Iceberg 1.8.0-SNAPSHOT (commit
ff813445916bfd6ec1cc30a02b02f8bade7a26f6)}}
[org.apache.iceberg.metrics.LoggingMetricsReporter]
2024-12-16 06:40:16,147 INFO Sending event of type: COMMIT_TO_TABLE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:16,153 INFO Commit complete to table
evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshot
409307506258743637, commit ID e2f65021-851f-47a6-b5ca-463d54fccd85,
valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:16,162 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:16,186 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85
complete, committed to 1 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:16,187 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:17,189 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:17,194 INFO Commit 931c9e74-e4c9-4549-81df-5f5e2ef9001c
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:37,202 INFO Commit timeout reached. Commit ID:
931c9e74-e4c9-4549-81df-5f5e2ef9001c
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:37,204 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:37,209 INFO Commit 931c9e74-e4c9-4549-81df-5f5e2ef9001c
complete, committed to 0 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:37,209 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:37,233 INFO Commit 44b08251-add2-4ccc-9b6a-2732fbe7677f
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:56,818 INFO [AdminClient clientId=adminclient-30] Node 2
disconnected. [org.apache.kafka.clients.NetworkClient]
2024-12-16 06:40:56,834 INFO [AdminClient clientId=adminclient-31] Node 1
disconnected. [org.apache.kafka.clients.NetworkClient]
2024-12-16 06:40:57,243 INFO Commit timeout reached. Commit ID:
44b08251-add2-4ccc-9b6a-2732fbe7677f
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:57,245 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:57,250 INFO Commit 44b08251-add2-4ccc-9b6a-2732fbe7677f
complete, committed to 0 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:57,250 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:57,274 INFO Commit e9000dcd-6e69-4ed0-add1-9dea50e95cba
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:17,281 INFO Commit timeout reached. Commit ID:
e9000dcd-6e69-4ed0-add1-9dea50e95cba
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:17,284 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:17,289 INFO Commit e9000dcd-6e69-4ed0-add1-9dea50e95cba
complete, committed to 0 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:17,290 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:17,314 INFO Commit a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:37,323 INFO Commit timeout reached. Commit ID:
a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:37,325 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:37,329 INFO Commit a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf
complete, committed to 0 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:37,330 INFO Sending event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:37,375 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:43,372 INFO Sending event of type: DATA_WRITTEN
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,372 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,380 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,380 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,382 INFO Handled event of type: DATA_WRITTEN
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,382 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
not ready, received responses for 0 of 1 partitions, waiting for more
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,382 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,426 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,426 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,427 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
not ready, received responses for 0 of 1 partitions, waiting for more
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,427 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,472 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,472 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,474 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
not ready, received responses for 0 of 1 partitions, waiting for more
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,474 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,498 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,498 INFO Sending event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,499 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
not ready, received responses for 0 of 1 partitions, waiting for more
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,499 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,523 INFO Handled event of type: START_COMMIT
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,525 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
ready, received responses for all 1 partitions
[org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,532 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:43,707 INFO Table loaded by catalog:
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay
[org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:41:44,094 INFO Committed to table
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay with the
new metadata location
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08742-42fb5ca7-3714-40c7-b511-427dc6bf1612.metadata.json
[org.apache.iceberg.hive.HiveTableOperations]
2024-12-16 06:41:44,094 INFO Successfully committed to table
iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay in 266 ms
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:44,094 INFO Committed snapshot 6751585210189296099
(MergeAppend) [org.apache.iceberg.SnapshotProducer]
2024-12-16 06:41:44,116 INFO Refreshing table metadata from new version:
abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08742-42fb5ca7-3714-40c7-b511-427dc6bf1612.metadata.json
[org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:44,261 INFO Received metrics report:
CommitReport{tableName=iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay,
snapshotId=6751585210189296099, sequenceNumber=8741, operation=append,
commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
totalDuration=PT0.404934664S, count=1}, attempts=CounterResult{unit=COUNT,
value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1},
removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=8764},
addedDeleteFiles=null, addedEqualityDeleteFiles=null,
addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null,
removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0},
addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null,
totalRecords=CounterResult{unit=COUNT, value=80409},
addedFilesSizeInBytes=CounterResult{unit=BYTES, value=29788},
removedFilesSizeInBytes=null,
totalFilesSizeInBytes=CounterResult{unit=BYTES, value=625902108},
addedPositionalDeletes=null, removedPositionalDeletes=null,
totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
addedEqualityDeletes=null, removedEqualityDeletes=null,
totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
metadata={iceberg-version=Apache Iceberg 1.8.0-SNAPSHOT (commit
ff813445916bfd6ec1cc30a02b02f8bade7a26f6)}}
[org.apache.iceberg.metrics.LoggingMetricsReporter]
2024-12-16 06:41:44,261 INFO Sending event of type: COMMIT_TO_TABLE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:44,268 INFO Commit complete to table
evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshot
6751585210189296099, commit ID 14c96a1b-5a1a-4a66-ae2a-b227aeb61981,
valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:44,278 INFO Sending event of type: COMMIT_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:44,282 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981
complete, committed to 1 table(s), valid-through null
[org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:44,283 INFO Handled event of type: DATA_COMPLETE
[org.apache.iceberg.connect.channel.Channel]
```
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [X] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]