thetumbled opened a new issue, #19780:
URL: https://github.com/apache/pulsar/issues/19780

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   Currently, `bin/pulsar-perf` do not support for verify the exactly-once 
semantics. Like Kafka, Pulsar only support for exactly-once semantics in 
specific working pattern, that is `consume-transform-produce`, while i try to 
achieve exactly-once semantics in transactional-producer-only working pattern, 
which may be impossible theoretically, but i will try my best to make it in 
practically.  the detailed implementation see 
https://github.com/apache/pulsar/pull/19662.
   This tool is to verify the transaction consistence with producer-only 
working pattern. We can use this tool to generate messages with or without 
transaction, and do the experiment in any disturbing environment, for example, 
we can restart brokers hundreds of times, unload topics hundreds of times or 
use `tc qdisc` to drop the packet. Finally, we consume the messages produced 
before without interference and check for the data integrity, that is no 
message lost or duplication.
   
   
   ### Solution
   
   Implement the logic into `PerformanceProducer`.
   
   ## Message Format
   For the convenience of checking for the data integrity, **we define the 
message format as number incrementing from 0 to N**. For example, we start the 
perf process with configuration `--num-messages 100000000`, then we read the 
content produced to broker successfully and **verify whether the content 
constitute a range `[0,100000000)`**, which can be implemented easily with 
`RangeSet`. 
   
   ## Client-Side Persistence Scheme
   Client may corrupt and restart when produce messages in transaction, it need 
to persist the messages that have been sent in transaction to avoid message 
lost or duplication. 
   I adopt a simper way to persist the messages, that is to persist every 
messages in a local file `tmpX.data` when producer send it in transaction. **If 
the transaction is committed successfully, the content of local file 
`tmpX.data` will be cleared. If the transaction is aborted or committed failed, 
we will resend messages in this aborted transaction with a new transaction.**
   **We assign every transaction with a tmp file, such as `tmp0.data`**, to 
persist the messages have been sent in this transaction. As there will be many 
transaction running concurrently, we will create many tmp files for 
persistence, and add concurrent control to these files.
   
   
   
   ### Alternatives
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to