sijie commented on issue #6787:
URL: https://github.com/apache/pulsar/pull/6787#issuecomment-618259435


   > The contract is clear and it's that a reader can read all the data that is 
being retained based on max time and size.
   
   This only covers 50% of the contract. If a reader is scanning a log, it 
should receive events in order and there are no data gaps. 
   
   @jerrypeng @merlimat I am not arguing about data retention. Please get my 
point correctly. My argument is about the "expected" behavior when a reader 
attached to a "distributed log". The reader should be able to read the messages 
from this cursor without missing any data. This is the behavior you can get 
from a storage system like any local or distributed file system. If we are 
saying Pulsar is an event/stream storage system, that is the *correct* behavior 
we should provide.
   
   > If we allow a reader to stay connected and have the data retained, then 
when would that be the limit? and what would be the action after the limit?
   
   If you look into any file system, when you are opening a file to read and 
there is a background process delete the file for whatever reason, the reader 
can still read the file until it is closed. The file system only reclaims disk 
spaces when the last active open file descriptor is closed.
   
   The problem of this change here is not about data retention. The problem of 
this change is that it introduces uncertainty in Reader API where people can 
not trust. The Reader API is effectively a storage API that people rely on 
building stateful applications. We should take this seriously and follow a 
common semantic that most of the storage systems provide. 
   
   As I said, if you want to relax this contract, it should be done via a flag. 
We should allow users to decide which behavior they want to choose. If I am 
building stateful applications that use Pulsar as the source of truth, I don't 
want to see those uncertainties. 
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to