michalantkowicz opened a new issue, #13752: URL: https://github.com/apache/iceberg/issues/13752
### Apache Iceberg version 1.9.2 (latest release) ### Query engine Trino ### Please describe the bug 🐞 I have running instance in k8s that is putting Kafka's messages to my Iceberg tables. From time to time, after the restart I have the following, as I suppose Glue-related, issues in logs, preceded by some commit related warning: ``` 2025-08-06T09:18:47.140Z INFO [iceberg-coord ] [o.a.i.connect.channel.Coordinator ] {}: Commit 7ad8812b-6591-4b4d-abcc-9d6b0f6067a6 initiated ... 2025-08-06T09:18:48.397Z WARN [iceberg-coord ] [o.a.i.connect.channel.CommitState ] {trace_id=08ba83bedcc8c610aee5a11fd0ee6474, trace_flags=01, span_id=bdf57fa8fd74331c}: Received commit response when no commit in progress, this can happen during recovery. Commit ID: 7ad8812b-6591-4b4d-abcc-9d6b0f6067a6 ^^ the log above is repeating a few times 2025-08-06T09:19:17.565Z INFO [iceberg-coord ] [o.a.i.connect.channel.CommitState ] {}: Commit timeout reached. Commit ID: 7ad8812b-6591-4b4d-abcc-9d6b0f6067a6 ... 2025-08-06T09:19:17.575Z WARN [iceberg-coord ] [o.a.i.connect.channel.Coordinator ] {}: Commit failed, will try again next cycle java.lang.IllegalStateException: Connection pool shut down at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:259) at software.amazon.awssdk.http.apache.ApacheHttpClient.access$600(ApacheHttpClient.java:104) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:236) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:233) at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:103) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:88) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:64) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:46) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.executeRequest(RetryableStage.java:93) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:56) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53) at software.amazon.awssdk.services.sts.DefaultStsClient.assumeRoleWithWebIdentity(DefaultStsClient.java:757) at software.amazon.awssdk.services.sts.auth.StsAssumeRoleWithWebIdentityCredentialsProvider.getUpdatedCredentials(StsAssumeRoleWithWebIdentityCredentialsProvider.java:76) at software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.updateSessionCredentials(StsCredentialsProvider.java:93) at software.amazon.awssdk.utils.cache.CachedSupplier.lambda$jitteredPrefetchValueSupplier$8(CachedSupplier.java:300) at software.amazon.awssdk.utils.cache.CachedSupplier$PrefetchStrategy.fetch(CachedSupplier.java:448) at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:208) at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:135) at software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.resolveCredentials(StsCredentialsProvider.java:106) at software.amazon.awssdk.services.sts.internal.StsWebIdentityCredentialsProviderFactory$StsWebIdentityCredentialsProvider.resolveCredentials(StsWebIdentityCredentialsProviderFactory.java:109) at software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider.resolveCredentials(WebIdentityTokenFileCredentialsProvider.java:141) at software.amazon.awssdk.auth.credentials.AwsCredentialsProvider.resolveIdentity(AwsCredentialsProvider.java:54) at software.amazon.awssdk.identity.spi.IdentityProvider.resolveIdentity(IdentityProvider.java:60) at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:103) at software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45) at software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:134) at software.amazon.awssdk.auth.credentials.AwsCredentialsProvider.resolveIdentity(AwsCredentialsProvider.java:54) at software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.lambda$trySelectAuthScheme$4(GlueAuthSchemeInterceptor.java:134) at software.amazon.awssdk.core.internal.util.MetricUtils.reportDuration(MetricUtils.java:81) at software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.trySelectAuthScheme(GlueAuthSchemeInterceptor.java:134) at software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.selectAuthScheme(GlueAuthSchemeInterceptor.java:81) at software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.beforeExecution(GlueAuthSchemeInterceptor.java:61) at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$beforeExecution$1(ExecutionInterceptorChain.java:59) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.beforeExecution(ExecutionInterceptorChain.java:59) at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.runInitialInterceptors(AwsExecutionContextBuilder.java:315) at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:151) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsSyncClientHandler.java:67) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:76) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53) at software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:30890) at org.apache.iceberg.aws.glue.GlueTableOperations.getGlueTable(GlueTableOperations.java:279) at org.apache.iceberg.aws.glue.GlueTableOperations.doRefresh(GlueTableOperations.java:128) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:88) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:71) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:49) at org.apache.iceberg.connect.channel.Coordinator.commitToTable(Coordinator.java:188) at org.apache.iceberg.connect.channel.Coordinator.lambda$doCommit$1(Coordinator.java:152) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) ``` sometimes, before timeout, I see the following messages for the commit: ``` 2025-08-06T09:19:08.091Z INFO [iceberg-coord ] [o.a.i.connect.channel.CommitState ] {trace_id=5c79db77f8193583564f9c0a6b71897a, trace_flags=01, span_id=e2c0fcfd749a670f}: Commit fb45dd59-3c9e-4cca-a075-37264d5b5a3a not ready, received responses for 1 of 3 partitions, waiting for more ``` This is definitely some issue I should take a look into, but what is worrying me most is the fact that Kafka Connect is loosing data - it's not in Iceberg anymore and never appear there. The problems I described above usually are being solved with the next restart of my instance but the data is never reprocessed I've checked source topic offsets and they're just getting bigger, probably never reverts. I've checked my configuration and I see that `errors.tolerance` is not overwritten so I suppose it should be `none` but all workers are in `RUNNING` state I've checked control topic `connect-offsets` but it is empty - I did read somewhere that it's expected because it's not being used for Sinks. Did anyone had this issue? ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org