thetumbled opened a new issue, #19780: URL: https://github.com/apache/pulsar/issues/19780
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Motivation Currently, `bin/pulsar-perf` do not support for verify the exactly-once semantics. Like Kafka, Pulsar only support for exactly-once semantics in specific working pattern, that is `consume-transform-produce`, while i try to achieve exactly-once semantics in transactional-producer-only working pattern, which may be impossible theoretically, but i will try my best to make it in practically. the detailed implementation see https://github.com/apache/pulsar/pull/19662. This tool is to verify the transaction consistence with producer-only working pattern. We can use this tool to generate messages with or without transaction, and do the experiment in any disturbing environment, for example, we can restart brokers hundreds of times, unload topics hundreds of times or use `tc qdisc` to drop the packet. Finally, we consume the messages produced before without interference and check for the data integrity, that is no message lost or duplication. ### Solution Implement the logic into `PerformanceProducer`. ## Message Format For the convenience of checking for the data integrity, **we define the message format as number incrementing from 0 to N**. For example, we start the perf process with configuration `--num-messages 100000000`, then we read the content produced to broker successfully and **verify whether the content constitute a range `[0,100000000)`**, which can be implemented easily with `RangeSet`. ## Client-Side Persistence Scheme Client may corrupt and restart when produce messages in transaction, it need to persist the messages that have been sent in transaction to avoid message lost or duplication. I adopt a simper way to persist the messages, that is to persist every messages in a local file `tmpX.data` when producer send it in transaction. **If the transaction is committed successfully, the content of local file `tmpX.data` will be cleared. If the transaction is aborted or committed failed, we will resend messages in this aborted transaction with a new transaction.** **We assign every transaction with a tmp file, such as `tmp0.data`**, to persist the messages have been sent in this transaction. As there will be many transaction running concurrently, we will create many tmp files for persistence, and add concurrent control to these files. ### Alternatives _No response_ ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
