frankjkelly opened a new issue #9266:
URL: https://github.com/apache/pulsar/issues/9266
**Describe the bug**
Retrieving data from Tiered Storage (S3) fails initially according to Broker
logs ("unknown exception") then succeeds after 15 second retry resulting in a
15 second response time minimum.
I can reproduce this consistently.
**To Reproduce**
Steps to reproduce the behavior:
1. Configure Tiered storage offload to S3
2. Upload data to a topic
2. Wait for Managed Ledger Rollver
3. Offload Topic
4. Attempt to retrieve data
**Expected behavior**
I expect some delay but not an exception followed by a 15 second retry
**Screenshots**
Logs
```
17:50:18.735 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO
org.apache.bookkeeper.mledger.impl.MetaStoreImpl -
[cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31]
[PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7] Updating cursor info
ledgerId=1108490 mark-delete=1107259:-1
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31] Updated
cursor PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7 with ledger id
1108490 md-position=1107259:-1 rd-position=1107259:0
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31] Opened new
cursor:
ManagedCursorImpl{ledger=cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31,
name=PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7, ackPos=1107259:-1,
readPos=1107259:0}
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31-PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7]
Rewind from 1047055:0 to 1047055:0
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31] There are
no replicated subscriptions on the topic
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31][PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7]
Created new subscription for 0
17:50:18.739 [bookkeeper-ml-workers-OrderedExecutor-5-0] INFO
org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:43862] Created
subscription on topic
persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31 /
PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7
17:50:18.771 [bookkeeper-ml-workers-OrderedExecutor-5-0] WARN
org.apache.bookkeeper.mledger.impl.OpReadEntry -
[cogito-dialog/wav/persistent/57f5ea00-a7a2-49c3-95d3-c8237d893d31][PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7]
read failed from ledger at position:1047055:0 : Unknown exception
17:50:18.771 [broker-topic-workers-OrderedScheduler-4-0] ERROR
org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer
- [persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31 /
PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7-Consumer{subscription=PersistentSubscription{topic=persistent://cogito-dialog/wav/57f5ea00-a7a2-49c3-95d3-c8237d893d31,
name=PulsarService-31955749-c92b-4bfe-a06a-2985d66b69c7}, consumerId=0,
consumerName=8f423, address=/127.0.0.1:43862}] Error reading entries at
1047055:0 : Unknown exception - Retrying to read in 15.0 seconds
```
There was no stack trace for that error
**Desktop (please complete the following information):**
- OS: Linux
- Pulsar: 2.6.1
**Additional context**
Tiered Offload Settings
```
$ ./apache-pulsar-2.6.1/bin/pulsar-admin --admin-url
http://platform-pulsar-broker:8080 namespaces get-offload-threshold
cogito-dialog/wav
0
$ ./apache-pulsar-2.6.1/bin/pulsar-admin --admin-url
http://platform-pulsar-broker:8080 namespaces get-offload-deletion-lag
cogito-dialog/wav
15 minute(s)
```
Broker Overrides to defaults
```
managedLedgerMaxLedgerRolloverTimeMinutes: "90"
```
Other Broker settings
```
# grep ffload broker.conf | grep -v "#"
managedLedgerOffloadDeletionLagMs=14400000
managedLedgerOffloadAutoTriggerSizeThresholdBytes=-1
offloadersDirectory=./offloaders
managedLedgerOffloadDriver=aws-s3
managedLedgerOffloadMaxThreads=2
managedLedgerOffloadPrefetchRounds=1
s3ManagedLedgerOffloadRegion=us-east-1
s3ManagedLedgerOffloadBucket=eks-saas-pulsar-455275ef9748510192359f2f92675b2e
s3ManagedLedgerOffloadServiceEndpoint=
s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864
s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576
gcsManagedLedgerOffloadRegion=
gcsManagedLedgerOffloadBucket=
gcsManagedLedgerOffloadMaxBlockSizeInBytes=67108864
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576
gcsManagedLedgerOffloadServiceAccountKeyFile=
fileSystemProfilePath=../conf/filesystem_offload_core_site.xml
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]