heesung-sn commented on PR #20044: URL: https://github.com/apache/pulsar/pull/20044#issuecomment-1500947438
To expand this discussion a little further, I think the current tableview works under best-effort delivery and needs some improvement in the following areas for mission-critical data replication in production. - guarantee liveness(better error handling. e.g infinite retry) - handle slowness (how to handle slow consumer, e.g self-kill if too slow. If the consumer barely receives messages due to bugs, network, and etc.) - expose operational metrics(table view state(retrying, active), consume latency, highest messageId(highest logical sequence number)) Currently, it is hard or impossible for callers to check them outside of tableview. And this pr is trying to expose the worst-case tableview state, where tableview is interrupted and stopped to consume messages. We could change the func name to isAlive, isConnected, isClosed, isStopped, or other. I found isInterrupted is a good choice here, as it tells if tableview is "interrupted" by some unknown reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
