wzhramc commented on code in PR #24704:
URL: https://github.com/apache/pulsar/pull/24704#discussion_r2329820186


##########
pip/pip-439.md:
##########
@@ -0,0 +1,573 @@
+# PIP-439: Adding Transaction Support to Pulsar Functions Through 
Auto-Transaction Wrapping
+
+# Background knowledge
+
+Apache Pulsar transactions enable atomic operations across multiple topics, 
allowing producers to send messages and consumers to acknowledge messages as a 
single unit
+of work. This provides the foundation for exactly-once processing semantics in 
streaming applications.
+
+## Transaction Architecture
+
+Pulsar's transaction system consists of four key components:
+
+1. **Transaction Coordinator (TC)**: A broker module that manages transaction 
lifecycles, allocates transaction IDs, and orchestrates the commit/abort 
process.
+
+2. **Transaction Log**: A persistent topic storing transaction metadata and 
state changes, enabling recovery after failures.
+
+3. **Transaction Buffer**: Temporarily stores messages produced within 
transactions, making them visible to consumers only after commit.
+
+4. **Pending Acknowledge State**: Tracks message acknowledgments within 
transactions, preventing conflicts between competing transactions.
+
+## Transaction Lifecycle
+
+Transactions follow a defined lifecycle:
+
+1. **OPEN**: Client obtains a transaction ID from the Transaction Coordinator.
+2. **PRODUCING/ACKNOWLEDGING**: Client registers topic 
partitions/subscriptions with the TC, then produces/acknowledges messages 
within the transaction.
+3. **COMMITTING/ABORTING**: Client requests to end the transaction, TC begins 
two-phase commit.
+4. **COMMITTED/ABORTED**: After processing all partitions, TC finalizes the 
transaction state.
+5. **TIMED_OUT**: Transactions exceeding their timeout are automatically 
aborted.
+
+## Transaction Guarantees
+
+Pulsar transactions provide:
+- Atomic writes across multiple topics
+- Conditional acknowledgment to prevent duplicate processing by "zombie" 
instances
+- Visibility control ensuring consumers only see committed transaction messages
+- Support for exactly-once processing in consume-transform-produce patterns
+
+# Motivation
+
+Currently, Pulsar Functions cannot publish to multiple topics transactionally, 
which is a significant limitation for use cases requiring atomic multi-topic
+publishing. For instance, if a function processes an input message and needs 
to publish related updates to several output topics, there's no guarantee that 
all
+operations will succeed atomically.
+
+This limitation prevents building robust stream processing applications that 
require exactly-once semantics across multiple input and output topics. Without
+transaction support in Functions, developers must implement their own error 
handling and retry mechanisms, which can be complex and error-prone.
+
+Adding transaction support to Pulsar Functions would finally ensure message 
processing atomicity.
+
+# Goals
+
+## In Scope
+
+1. Enable automatic transaction support for Pulsar Functions through 
configuration
+2. Allow Functions to publish messages to multiple topics within a single 
transaction
+3. Support transactional acknowledgment of input messages
+4. Ensure transactions are committed only if message processing completes 
successfully
+5. Provide transaction timeout configuration for Functions
+
+## Out of Scope
+
+1. Exposing explicit transaction management APIs in the Functions interface
+2. Supporting multi-function transactions (transactions spanning multiple 
function invocations)
+3. Adding transaction support to Pulsar IO connectors
+4. Changes to the Function interface itself
+
+# High Level Design
+
+The proposed solution introduces automatic transaction wrapping for Pulsar 
Functions through configuration settings. When enabled, each function execution 
will be
+automatically wrapped in a transaction without requiring code changes to the 
function implementation.
+
+The general flow will be:
+1. Function is configured with `autoTransactionsEnabled: true`

Review Comment:
   Perfect suggestion! Updated the proposal accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to