momo-jun commented on a change in pull request #14546:
URL: https://github.com/apache/pulsar/pull/14546#discussion_r825549537



##########
File path: site2/docs/cookbooks-retention-expiry.md
##########
@@ -358,11 +371,22 @@ admin.namespaces().removeNamespaceMessageTTL(namespace)
 
 ## Delete messages from namespaces
 
-If you do not have any retention period and that you never have much of a 
backlog, the upper limit for retaining messages, which are acknowledged, equals 
to the Pulsar segment rollover period + entry log rollover period + (garbage 
collection interval * garbage collection ratios).
+When it comes to the physical storage size, message expiry and retention are 
just like two sides of the same coin.
+* The backlog quota and TTL parameters prevent disk size from growing 
indefinitely, as Pulsar’s default behaviour is to persist unacknowledged 
messages. 
+* The retention policy allocates storage space to accommodate the messages 
that are supposed to be deleted by Pulsar by default.
+
+As a conclusion, the size of your physical storage should accommodate the sum 
of the backlog quota and the retention size. 
+
+The message deletion rate (releasing rate of disk space) can be determined by 
multiple factors. 
 
 - **Segment rollover period**: basically, the segment rollover period is how 
often a new segment is created. Once a new segment is created, the old segment 
will be deleted. By default, this happens either when you have written 50,000 
entries (messages) or have waited 240 minutes. You can tune this in your broker.
 
 - **Entry log rollover period**: multiple ledgers in BookKeeper are 
interleaved into an [entry 
log](https://bookkeeper.apache.org/docs/4.11.1/getting-started/concepts/#entry-logs).
 In order for a ledger that has been deleted, the entry log must all be rolled 
over.
 The entry log rollover period is configurable, but is purely based on the 
entry log size. For details, see 
[here](https://bookkeeper.apache.org/docs/4.11.1/reference/config/#entry-log-settings).
 Once the entry log is rolled over, the entry log can be garbage collected.
 
 - **Garbage collection interval**: because entry logs have interleaved 
ledgers, to free up space, the entry logs need to be rewritten. The garbage 
collection interval is how often BookKeeper performs garbage collection. which 
is related to minor compaction and major compaction of entry logs. For details, 
see 
[here](https://bookkeeper.apache.org/docs/4.11.1/reference/config/#entry-log-compaction-settings).
+
+The diagram below illustrates one of the cases that the consumed storage size 
is larger than the given limits for backlog and retention, because messages 
over the retention limit are kept because other messages in the same segment 
are still within retention period.

Review comment:
       Updated. Thanks, Yu.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to