michalantkowicz opened a new issue, #13752:
URL: https://github.com/apache/iceberg/issues/13752

   ### Apache Iceberg version
   
   1.9.2 (latest release)
   
   ### Query engine
   
   Trino
   
   ### Please describe the bug 🐞
   
   I have running instance in k8s that is putting Kafka's messages to my 
Iceberg tables. From time to time, after the restart I have the following, as I 
suppose Glue-related, issues in logs, preceded by some commit related warning:
   ```
   2025-08-06T09:18:47.140Z INFO  [iceberg-coord                      ] 
[o.a.i.connect.channel.Coordinator   ] {}: Commit 
7ad8812b-6591-4b4d-abcc-9d6b0f6067a6 initiated
   
   ...
   
   2025-08-06T09:18:48.397Z WARN  [iceberg-coord                      ] 
[o.a.i.connect.channel.CommitState   ] 
{trace_id=08ba83bedcc8c610aee5a11fd0ee6474, trace_flags=01, 
span_id=bdf57fa8fd74331c}: Received commit response when no commit in progress, 
this can happen during recovery. Commit ID: 7ad8812b-6591-4b4d-abcc-9d6b0f6067a6
   
   ^^ the log above is repeating a few times
   
   2025-08-06T09:19:17.565Z INFO  [iceberg-coord                      ] 
[o.a.i.connect.channel.CommitState   ] {}: Commit timeout reached. Commit ID: 
7ad8812b-6591-4b4d-abcc-9d6b0f6067a6
   
   ...
   
   2025-08-06T09:19:17.575Z WARN  [iceberg-coord                      ] 
[o.a.i.connect.channel.Coordinator   ] {}: Commit failed, will try again next 
cycle
   java.lang.IllegalStateException: Connection pool shut down
        at org.apache.http.util.Asserts.check(Asserts.java:34)
        at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
        at 
software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
        at 
software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
        at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
        at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at 
software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
        at 
software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:259)
        at 
software.amazon.awssdk.http.apache.ApacheHttpClient.access$600(ApacheHttpClient.java:104)
        at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:236)
        at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:233)
        at 
software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:103)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:88)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:64)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:46)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.executeRequest(RetryableStage.java:93)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:56)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
        at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
        at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
        at 
software.amazon.awssdk.services.sts.DefaultStsClient.assumeRoleWithWebIdentity(DefaultStsClient.java:757)
        at 
software.amazon.awssdk.services.sts.auth.StsAssumeRoleWithWebIdentityCredentialsProvider.getUpdatedCredentials(StsAssumeRoleWithWebIdentityCredentialsProvider.java:76)
        at 
software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.updateSessionCredentials(StsCredentialsProvider.java:93)
        at 
software.amazon.awssdk.utils.cache.CachedSupplier.lambda$jitteredPrefetchValueSupplier$8(CachedSupplier.java:300)
        at 
software.amazon.awssdk.utils.cache.CachedSupplier$PrefetchStrategy.fetch(CachedSupplier.java:448)
        at 
software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:208)
        at 
software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:135)
        at 
software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.resolveCredentials(StsCredentialsProvider.java:106)
        at 
software.amazon.awssdk.services.sts.internal.StsWebIdentityCredentialsProviderFactory$StsWebIdentityCredentialsProvider.resolveCredentials(StsWebIdentityCredentialsProviderFactory.java:109)
        at 
software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider.resolveCredentials(WebIdentityTokenFileCredentialsProvider.java:141)
        at 
software.amazon.awssdk.auth.credentials.AwsCredentialsProvider.resolveIdentity(AwsCredentialsProvider.java:54)
        at 
software.amazon.awssdk.identity.spi.IdentityProvider.resolveIdentity(IdentityProvider.java:60)
        at 
software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:103)
        at 
software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45)
        at 
software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:134)
        at 
software.amazon.awssdk.auth.credentials.AwsCredentialsProvider.resolveIdentity(AwsCredentialsProvider.java:54)
        at 
software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.lambda$trySelectAuthScheme$4(GlueAuthSchemeInterceptor.java:134)
        at 
software.amazon.awssdk.core.internal.util.MetricUtils.reportDuration(MetricUtils.java:81)
        at 
software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.trySelectAuthScheme(GlueAuthSchemeInterceptor.java:134)
        at 
software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.selectAuthScheme(GlueAuthSchemeInterceptor.java:81)
        at 
software.amazon.awssdk.services.glue.auth.scheme.internal.GlueAuthSchemeInterceptor.beforeExecution(GlueAuthSchemeInterceptor.java:61)
        at 
software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$beforeExecution$1(ExecutionInterceptorChain.java:59)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at 
software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.beforeExecution(ExecutionInterceptorChain.java:59)
        at 
software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.runInitialInterceptors(AwsExecutionContextBuilder.java:315)
        at 
software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:151)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsSyncClientHandler.java:67)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:76)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
        at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
        at 
software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:30890)
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.getGlueTable(GlueTableOperations.java:279)
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.doRefresh(GlueTableOperations.java:128)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:88)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:71)
        at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:49)
        at 
org.apache.iceberg.connect.channel.Coordinator.commitToTable(Coordinator.java:188)
        at 
org.apache.iceberg.connect.channel.Coordinator.lambda$doCommit$1(Coordinator.java:152)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
   ```
   
   sometimes, before timeout, I see the following messages for the commit:
   
   ```
   2025-08-06T09:19:08.091Z INFO  [iceberg-coord                      ] 
[o.a.i.connect.channel.CommitState   ] 
{trace_id=5c79db77f8193583564f9c0a6b71897a, trace_flags=01, 
span_id=e2c0fcfd749a670f}: Commit fb45dd59-3c9e-4cca-a075-37264d5b5a3a not 
ready, received responses for 1 of 3 partitions, waiting for more
   ```
   
   This is definitely some issue I should take a look into, but what is 
worrying me most is the fact that Kafka Connect is loosing data - it's not in 
Iceberg anymore and never appear there. The problems I described above usually 
are being solved with the next restart of my instance but the data is never 
reprocessed
   
   I've checked source topic offsets and they're just getting bigger, probably 
never reverts. I've checked my configuration and I see that `errors.tolerance` 
is not overwritten so I suppose it should be `none` but all workers are in 
`RUNNING` state
   
   I've checked control topic `connect-offsets` but it is empty - I did read 
somewhere that it's expected because it's not being used for Sinks. 
   
   Did anyone had this issue?
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to