merlimat commented on issue #6787: URL: https://github.com/apache/pulsar/pull/6787#issuecomment-618497703
> This only covers 50% of the contract. If a reader is scanning a log, it should receive events in order and there are no data gaps. That's not what the contract is. And it's not how the reader was designed. Take a look at https://github.com/apache/pulsar/issues/355 ``` Introduce a new low-level API that allow applications to read through all the messages available in a topic without the need of creating a subscription. A reader is a new entity in the Pulsar client API that will only exists when connected to a broker. Reader will be able to decide at which message id to start reading, and it will read all the messages after that. A reader is only useful in practice if the retention time is set to keep the data available for a given amount of time. The reader being connected will not prevent the broker to delete the data once the retention period expires. ``` That is to say: "The reader should receive events in order and there are no data gaps, ***while operating within the configured constraints***". Again, I want to be absolutely clear that this change is not relaxing any guarantee over current behavior. We should not be mixing the life-cycle of 2 different components: * The client `Reader` object is valid from when it's created until one calls `close()` * Whenever a client `Reader` is connected to a broker, there will be a "NonDurableSubscription" associated. Its lifecycle is tied to the state of the TCP connection, or an explicit close. Right now, the data is only retained within the scope of the `NonDurableSubscription`. That is not giving any meaningful guarantee to to the `Reader` concept, since a TCP re-connection can happen at any time for many different reasons. Since the fact that "NonDurableSubscription" retaining data goes directly against the stated goal, and it doesn't provide any guarantees, it should be regarded as a bug, not as an optional feature. > If you look into any file system, when you are opening a file to read and there is a background process delete the file for whatever reason, the reader can still read the file until it is closed. The file system only reclaims disk spaces when the last active open file descriptor is closed. A file system is also operating within system constraints. When the disk (or user quota) is full, some actions will have to be taken. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
