Renkai opened a new issue #8591:
URL: https://github.com/apache/pulsar/issues/8591


   Currently, if the data in pulsar was offloaded to the second storage layer, 
data can still exists in bookkeeper for a period of time, but the client will 
directly read data from the second layer. 
   
   This may lead to several problems:
   
   - Read from second layer have different performance characteristics, which 
may lead wrong estimate from users if they didn't know which layer they are 
reading.
   - The second layer may be managed by another team rather than Pulsar 
management team(for example, a independent HDFS management team), they may have 
independent quota or authority policy to users.
   - The second layer storage can be infinite in theory, if user set cursor to 
an error time in accident, it will cause a lot of resource waste.
   
   So it's better to make data source configurable if data exists in both layer.
   
   Maybe the below options are enough:
   
   - first layer only
   - first layer first
   - second layer only
   - second layer first
   
   We can make `second layer fist` as the default value, which will result to 
the same behavior with current version.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to