pomazanov opened a new issue #10481: URL: https://github.com/apache/pulsar/issues/10481
**Describe the bug** S3 offloader stops working when running on EC2 instance for more that 24 hrs. The following exception could be found in the logs: java.util.concurrent.CompletionException: org.jclouds.aws.AWSResponseException: request POST https://pubsub-offload-test.s3.amazonaws.com/3157f608-6d4f-4689-a3e8-00b8066a9881-ledger-41659?uploads HTTP/1.1 failed with code 4 00, error: AWSError{requestId='H2MQGY4EFJ0APS90', requestToken='Z+RmgsU0RQM924upPBY+ysWL3lBmTdyx+4GcmQDrXrqjxu91x/b1Qej/ax+II59IGhQLC7HA0/4=', code='InvalidToken', message='The provided token is malformed or otherwise invalid.', context='{Token-0=IQoJb3JpZ2luX2VjEC8aCXVzLWVhc3QtMSJIMEYCIQCqbpsZ4AxBTsPOHmefZtVX1B7wKL06no4xtDR/7KclqwIhAJTMknlmeN45x0qvLxkxJZypS5OCmm1kcAz0rdWixazNKr0DCJf//////////wEQAhoMMTA5NTI1MTM0NTY2Igywk+W1brRxd3ExM9wqkQN7UFvuU2XM4L2naA9g2NuuXZTyjD3WWHMTzH1Qhqtzq79DvgLS5ckFBwYpdrc6GZiQvgf6BAG6tIsp9ch5FxHjCsGw5k0hpG2GtlOA66agfgDUKQJySwR/lBZgn0KvYO/VIG1ge8wDX5CvyMsI9PSft2XpH3oHq9RSvLcONBRwjPf96ygb8UayiqTVkS+NoM1PtMw/RHaKuDDsDsXGr98HkJ5Ink8AjhgCGX+q1tkgJg7zv0FxOCL3qYaTs3+GqfMCaLlyaT1FFytlNFgdFWDNwIVHWJ54lXexfkY1x78xqBczaRrfCttxUVRM0D+TeaVbM98MpGCuMe6WiYBRnbhL/hWhTgY/rgmHijNRaDw1hnSv2xBE50ZKIjTD6B+RyAZijNaSBHzQDHxGgoZs8oZeukGo6HaEhYTKOxhwobcoHF8Zzi0QDbvDdmF72BJzramto5ZYIaXECHMJO5WOKt81qnsMJekCI5MlR0QWl+n+3JSAhibEUD3Gzmc3z4Cn0lZpvJwFzGfROIdTVdyJj3gB5jDWpv2DBjrqAZRCr6nC 0NUui5+znlCpu4MdoEP/2kpizzyQoRMw26b0HJVerdQh8nD7w5jhr+mwt5PZmrzcmOYgJmEf6WKgFWw75GqmJkYP1sHEF5c2mVr/pqoWBkeGpTaDvC0RR5kWdK5h8Efii6CYfynGAiCE7ivgyKpRDSBZX8P+0XNiP0jwnXAx1o9C4/LwcFRT9eRFKO0gfEEVJZ7WzWt66N/bXX9b/211QGW1IP2TDWZKtLfOUIaImn/YHOml57jN+x1z8A4UKCc5CpSJN1L9NX1Cj5fX0Uyp9IaWt8WPs91A6aoajsy24gELeZrQMg==, HostId=Z+RmgsU0RQM924upPBY+ysWL3lBmTdyx+4GcmQDrXrqjxu91x/b1Qej/ax+II59IGhQLC7HA0/4=}'} at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) ~[?:?] at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) ~[?:?] at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1093) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) [?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) [?:?] at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$offload$0(BlobStoreManagedLedgerOffloader.java:151) [tiered-storage-jcloud-2.7.0.nar-unpacked/:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) [com.google.guava-guava-30.0-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) [com.google.guava-guava-30.0-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) [com.google.guava-guava-30.0-jre.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at java.lang.Thread.run(Thread.java:832) [?:?] Caused by: org.jclouds.aws.AWSResponseException: request POST https://pubsub-offload-test.s3.amazonaws.com/3157f608-6d4f-4689-a3e8-00b8066a9881-ledger-41659?uploads HTTP/1.1 failed with code 400, error: AWSError{requestId='H2MQGY4EFJ0APS90', requestToken='Z+RmgsU0RQM924upPBY+ysWL3lBmTdyx+4GcmQDrXrqjxu91x/b1Qej/ax+II59IGhQLC7HA0/4=', code='InvalidToken', message='The provided token is malformed or otherwise invalid.', context='{Token-0=IQoJb3J pZ2luX2VjEC8aCXVzLWVhc3QtMSJIMEYCIQCqbpsZ4AxBTsPOHmefZtVX1B7wKL06no4xtDR/7KclqwIhAJTMknlmeN45x0qvLxkxJZypS5OCmm1kcAz0rdWixazNKr0DCJf//////////wEQAhoMMTA5NTI1MTM0NTY2Igywk+W1brRxd3ExM9wqkQN7UFvuU2XM4L2naA9g2NuuXZTyjD3WWHMTzH1Qhqtzq79DvgLS5ckFBwYpdrc6GZiQvgf6BAG6tIsp9ch5FxHjCsGw5k0hpG2GtlOA66agfgDUKQJySwR/lBZgn0KvYO/VIG1ge8wDX5CvyMsI9PSft2XpH3oHq9RSvLcONBRwjPf96ygb8UayiqTVkS+NoM1PtMw/RHaKuDDsDsXGr98HkJ5Ink8AjhgCGX+q1tkgJg7zv0FxOCL3qYaTs3+GqfMCaLlyaT1FFytlNFgdFWDNwIVHWJ54lXexfkY1x78xqBczaRrfCttxUVRM0D+TeaVbM98MpGCuMe6WiYBRnbhL/hWhTgY/rgmHijNRaDw1hnSv2xBE50ZKIjTD6B+RyAZijNaSBHzQDHxGgoZs8oZeukGo6HaEhYTKOxhwobcoHF8Zzi0QDbvDdmF72BJzramto5ZYIaXECHMJO5WOKt81qnsMJekCI5MlR0QWl+n+3JSAhibEUD3Gzmc3z4Cn0lZpvJwFzGfROIdTVdyJj3gB5jDWpv2DBjrqAZRCr6nC0NUui5+znlCpu4MdoEP/2kpizzyQoRMw26b0HJVerdQh8nD7w5jhr+mwt5PZmrzcmOYgJmEf6WKgFWw75GqmJkYP1sHEF5c2mVr/pqoWBkeGpTaDvC0RR5kWdK5h8Efii6CYfynGAiCE7ivgyKpRDSBZX8P+0XNiP0jwnXAx1o9C4/LwcFRT9eRFKO0gfEEVJZ7WzWt66N/bXX9b/211QGW1IP2TDWZKtLfOUIaImn/YHOml57jN+x1z8A4UKC c5CpSJN1L9NX1Cj5fX0Uyp9IaWt8WPs91A6aoajsy24gELeZrQMg==, HostId=Z+RmgsU0RQM924upPBY+ysWL3lBmTdyx+4GcmQDrXrqjxu91x/b1Qej/ax+II59IGhQLC7HA0/4=}'} at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:76) ~[?:?] at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:65) ~[?:?] at org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:138) ~[?:?] at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:107) ~[?:?] at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91) ~[?:?] at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74) ~[?:?] at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45) ~[?:?] at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156) ~[?:?] at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123) ~[?:?] at com.sun.proxy.$Proxy214.initiateMultipartUpload(Unknown Source) ~[?:?] at org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:370) ~[?:?] at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$offload$0(BlobStoreManagedLedgerOffloader.java:149) ~[?:?] ... 11 more **To Reproduce** Steps to reproduce the behavior: 1. Deploy Pulsar in AWS using bare metal method: https://pulsar.apache.org/docs/en/deploy-bare-metal/ and enable offload to S3 by following instructions in the documentation: https://pulsar.apache.org/docs/en/tiered-storage-aws/ 2. Generate some load for example using https://pulsar.apache.org/docs/en/develop-tools/ 3. Verify that offload is working and new files are created in S3 bucket 4. Wait for >~24 hours and look at broker log file to find exception above. 5. Verify that new offload files are no longer created in S3 bucket **Expected behavior** S3 offload should work continuously until stopped explicitely **Additional context** Removing the following if statement seems to fix the issue: https://github.com/apache/pulsar/blob/master/tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/provider/JCloudBlobStoreProvider.java#L3074 It looks like token caching and refresh might be already handled inside AWS SDK code and therefore this additional check might not be required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
