wuzhanpeng opened a new issue #13526:
URL: https://github.com/apache/pulsar/issues/13526
## Motivation
Related to #13238
Corresponding logic:
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`
Under the current ledger-trimming design, we need to collect those ledgers
that need to be deleted first, and then perform the asynchronous deletion of
the ledger concurrently, but we do not continue to pay attention to whether the
deletion operation is completed. If the meta-information update has been
successfully completed but an error occurs during the asynchronous deletion,
the ledger may not be deleted, but at the logical level we think that the
deletion has been completed, which will make this part of the data remain in
the storage layer forever (such as bk). As the usage time of the cluster
becomes longer, the residual data that cannot be deleted will gradually
increase.
In order to achieve this goal, we can separate the logic of meta-information
update and ledger deletion. In the trimming process, we only mark which ledgers
are deletable, and update the results to the metadatastore. Then another
checker thread is started to periodically check whether there are deletable
ledgers, and perform actual storage deletions on them. After the deletion is
complete, finally clear the ledger deletable mark.
## Goal
We need to modify some of the logic in
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers` to
make ledger-trimming convert to a completely synchronous method and no longer
include asynchronous delete method calls. In addition, we also need to add a
checker thread to periodically check the ledger that can be deleted, and
complete the deletion of these ledger.
## API Changes
`org.apache.bookkeeper.mledger.ManagedLedger`
```java
public interface ManagedLedger {
...
/**
* Mark deletable ledgers
*
* @param ledgers
*/
void markDeletableLedger(Iterable<LedgerInfo> ledgers);
/**
* Get all deletable ledgers
*/
void getAllDeletableLedgers();
/**
* Check and remove all deletable ledgers
*/
void removeAllDeletableLedgers();
}
```
## Implementation
This proposal aims to separate the deletion logic in ledger-trimming, so
that `ManagedLedgerImpl#internalTrimLedgers` is only responsible for updating
the meta-information and marking the deletable ledgers, and then another thread
is responsible for the actual ledger deletion.
Therefore, the entire trimming process is broken down into the following
steps:
1. update ledger metadata and mark deletable ledgers.
2. another thread will periodically check whether there are some deletable
ledgers in managed-ledger and delete them.
For step 1, we can store the marker of deletable information in
`org.apache.bookkeeper.mledger.proto.MLDataFormats.ManagedLedgerInfo`. When
retrieving the deleted ledger information, we can directly query it from
`mlInfo`. If this solution is not accepted, maybe we can create a new znode to
store these information, but this approach will not be able to reuse the
current design.
For step 2, we can set up a scheduled thread in
`org.apache.pulsar.broker.service.BrokerService` like `checkConsumedLedgers` or
`org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl` to periodically detect
whether there are some deletable ledgers according to the granularity of topic
or managed-ledger, and delete them if any.
## Reject Alternatives
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]