sijie commented on issue #6787: URL: https://github.com/apache/pulsar/pull/6787#issuecomment-618259435
> The contract is clear and it's that a reader can read all the data that is being retained based on max time and size. This only covers 50% of the contract. If a reader is scanning a log, it should receive events in order and there are no data gaps. @jerrypeng @merlimat I am not arguing about data retention. Please get my point correctly. My argument is about the "expected" behavior when a reader attached to a "distributed log". The reader should be able to read the messages from this cursor without missing any data. This is the behavior you can get from a storage system like any local or distributed file system. If we are saying Pulsar is an event/stream storage system, that is the *correct* behavior we should provide. > If we allow a reader to stay connected and have the data retained, then when would that be the limit? and what would be the action after the limit? If you look into any file system, when you are opening a file to read and there is a background process delete the file for whatever reason, the reader can still read the file until it is closed. The file system only reclaims disk spaces when the last active open file descriptor is closed. The problem of this change here is not about data retention. The problem of this change is that it introduces uncertainty in Reader API where people can not trust. The Reader API is effectively a storage API that people rely on building stateful applications. We should take this seriously and follow a common semantic that most of the storage systems provide. As I said, if you want to relax this contract, it should be done via a flag. We should allow users to decide which behavior they want to choose. If I am building stateful applications that use Pulsar as the source of truth, I don't want to see those uncertainties. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
