Hi, all:
        I drafted a PIP about configurable data source priority for offloaded 
messages, newest version at 
https://gist.github.com/Renkai/e5be927404fbfd8289e7703c55812b1c 
<https://gist.github.com/Renkai/e5be927404fbfd8289e7703c55812b1c> , current 
version post below this mail, hope anyone can help review it and make it an 
official PIP

Motivation

Currently, if the data in pulsar was offloaded to the second storage layer, 
data can still exists in bookkeeper for a period of time, but the client will 
directly read data from the second layer.

This may lead to several problems:

Read from second layer have different performance characteristics, which may 
lead wrong estimate from users if they didn't know which layer they are reading.
The second layer may be managed by another team rather than Pulsar management 
team(for example, a independent HDFS management team), they may have 
independent quota or authority policy to users.
The second layer storage can be infinite in theory, if user set cursor to an 
error time in accident, it will cause a lot of resource waste.
So it's better to make data source configurable if data exists in both layer.

Maybe the below options are enough:

BOOKKEEPER_ONLY
BOOKKEEPER_FIRST
OFFLOADED_ONLY
OFFLOADED_FIRST
Background

Now which layer was broker read from is decide by 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#getLedgerHandle(long 
ledgerId) 
<https://github.com/apache/pulsar/blob/a3584309017f1894a05b05c695c42e7aa8b7c3a7/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1521>
 which only have one parameter ledgerId , and will choose the offloaded ledger 
handle as soon as the ledger was offloaded. If the choosed handle fails all the 
getLedgerHandle fails.

Implementation

The tiered read priority should be set by namespace or topic, the method in 
command line tool should be looks like

pulsar-admin namespaces --set-tiered-read-priority tenant/namespace 
priority-policie

pulsar-admin topics --set-tiered-read-priority tenant/namespace/topic 
priority-policie
If not configured, OFFLOADED_FIRST should be used by default, which will result 
to the same behavior with current version.

Then the corresponding ManagedLedger should be aware what priority option 
client is using, and the signature the getLedgerHandle method should be change 
to

CompletableFuture<ReadHandle> getLedgerHandle(
long ledgerId, TieredReadPriority priority) {
For BOOKKEEPER_ONLY and OFFLOADED_ONLY, the ManagedLedger will use the 
corresponding ReadHandle directly. For BOOKKEEPER_FIRST and OFFLOADED_FIRST , 
ManagedLedger will fall back to the secondary storage, no matter the ledger in 
the first layer storage does not exist, or there is something wrong in network 
or disk or authorization with first layer storage.

Reply via email to