Re: [PROPOSAL][WORKFLOW] Transaction in workflow runtime

Kris Verlaenen Mon, 19 Aug 2024 05:54:14 -0700

Enrique,

I think we are saying the same thing: if you have multiple writes and you
want to keep strong consistency, either to one data source or to different
data sources, you would use a transaction for this.
If you write to one datasource, you could potentially use a local
transaction (assuming the data source supports it) but if you have multiple
data sources (and this for example includes sending messages) you would use
a distributed transaction across datasources.  I have no concerns to
optionally allow the engine to automatically start / commit transactions.


My questions were mostly around use cases that might not require this,
let's make sure we keep this optional so that we don't introduce overhead
for use cases that might not require it (straight through processes,
workflows without persistence, use cases where we are only targeting
eventual consistency, etc.)***.

Thx,
Kris

*** Note that the behavior might be quite different depending on your
runtime architecture, and we might have to revisit and/or improve the
current code base as well to better support these use cases, often things
are not (yet) completely implemented.


On Wed, Aug 7, 2024 at 7:04 PM Enrique Gonzalez Martinez <
elguard...@gmail.com> wrote:

> Inline answers
>
> El mié, 7 ago 2024 a las 18:35, Kris Verlaenen
> (<kris.verlae...@gmail.com>) escribió:
> >
> > I share the concern with Francisco that we should be careful that we
> don't
> > introduce a solution that would require a distributed transaction to
> > guarantee consistency, there are other approaches that might be
> sufficient
> > or that deliver eventual consistency that I think we should allow.
> >
>
> I have never talked about distributed transactions. (if you understand
> them we need to export transactions from the runtime to the data index
> during execution in a distributed deployment).
>
> Transaction in this context means an operation execution within the
> deployment.
> We need transactions as several writes to the database are performed
> during the same workflow execution. There is no other way to guarantee
> consistency than using transactions.
> XA (2 phase commit transactions) within the same deployment is
> regarding some parts of the system that might write to different
> databases. For instance runtime + data index or runtime + data audit.
> Therefore the 2 phases commit transactions.
>
> > I have a few questions:
> >  * Could you clarify what you mean with subsystems?  Because depending on
> > your architecture I guess this could be different (for example you could
> > run a job service embedded or as a separate service).
>
> Subsystems are all those services required or optional that add
> functionality to the engine. jobs, data index, data audit, user tasks.
>
> > One could argue that
> > the work of another system does not have to be done as part of the same
> > transaction if the communication with that other system is done in a
> > guaranteed way?  Or do we consider those not a subsystem in that case?
>
> The transaction is not being exported or imported from one deployment
> to another (never talked about that). The only thing we want to
> guarantee is that an operation within the same deployment is
> consistent. If that requirement is met all systems will be eventually
> consistent.
> e.g
>
> Runtime executes an operation and sends events to kafka. This is a
> transaction within the same deployment and it is consistent in
> runtime.
> Data index consumes the event from kafka to the storage. This is
> another transaction within data index deployment.
>
> There are two different transactions. They are not the same
> transaction but within the deployment they are consistent. and among
> systems they will be eventually consistent as data indexes will end up
> consuming the event.
>
>
> >  * For some external services it might not be required to be part of a
> > transaction?  For example if you're just querying some information it
> might
> > be totally acceptable to do this REST invocation directly and outside a
> > transaction.
>
> Which system are you referring to ? For now the only invocation we
> have within the engine is job service, so Rest cannot be outside the
> transaction as the engine requires information exchange (we send the
> data to create the timer and we get the job id).
> If you are talking about service tasks (WIH) the author will have to
> decide if it is querying or just need to create a compensation
> mechanism.
>
> > Similarly, it might be fine to send out events directly,
> > there might be other ways to compensate or ignore events later if
> necessary.
>
> If you try to compensate for events in a bpmn you will tie the design
> of the workflow to the environments. So there won't be any point in
> the abstraction itself of the process.
> Phantom events or Duplication events are off the table in bpmn as it
> can impact the workflow execution. Serverless workflow might be more
> tolerant in those abstractions but I guess that if a workflow receives
> a phantom signal or event and executes it, I can make a guess it won't
> be any good.
>
> >  * It's unclear to me how this interacts with the unit of work.  The idea
> > of the unit of work is to collect all the changes, and then apply them
> all
> > at once towards the end.
>
> it does not matter how the unit of work does things. The unit of work
> is not everything that happens in an execution (e.g correlation
> service, messaging, jobs, index, audit....).
> The interaction is about operations invoked against the engine.
>
> > This way you could have different implementations
> > of the unit of work.  Is transactions a specific implementation of the
> unit
> > of work where you start/commit a transaction when the unit of work is
> > started/ended?  And another alternative would be we write everything to a
> > single data source, and use the outbox pattern or similar for further
> > processing?
>
> That won't do per unit of work impl limitation. (see my previous comment).
> Even if you have a single data source you write several times in
> several tables, so you need transactions to keep consistency.
>
> >
> > Thx,
> > Kris
> >
> >
> > On Fri, Aug 2, 2024 at 8:39 AM Enrique Gonzalez Martinez <
> > egonza...@apache.org> wrote:
> >
> > > * Transactions*
> > > This document describes how to support transactions in the domain of
> > > workflow engine and subsystems.
> > >
> > > The use cases for transactions in workflows is to enable consistency
> > > during workflow executions.
> > >
> > > * Constraints *
> > >
> > > The constraints for this are related to different types of transaction
> > > problems:
> > >
> > > Workflow transaction execution should be in one single transaction
> > > (until idle elements are reached or there are no more elements to
> > > process)
> > >
> > > Process state should be consistent in storage in one single
> > > transaction. In the case of database multiple tables should be written
> > > in an atomic transaction
> > >
> > > Reactive code should be removed as it does not behave properly with
> > > transactions.
> > >
> > > Transactions Policy among workflow runtime and subsystems should be
> > > consistent in terms of configuration (no subcomponent should start a
> > > transaction if there is already one on the go, but they should mandate
> > > to be in a transaction)
> > >
> > > Error handling should still produce an event that can be stored.
> > >
> > > Subsystems execution should be included during transactions
> > >
> > > Async execution will spawn its own transaction.
> > >
> > > * Architecture *
> > >
> > > The architecture of the solution impacts some areas:
> > >
> > > Components with reactive that are involved in transaction refactor. So
> > > far, the only subsystem using reactive code job service.
> > >
> > > Process Code generation should change in order to reflect the
> > > transactions of the workflow engine
> > >
> > > Error handling should be modified in a way the error is captured
> > > outside the transaction and handled in a different one to avoid event
> > > loss.
> > >
> > > Exchange information among runtime and subsystems should be in a way
> > > that those elements are involved in a transaction or they can be
> > > rolled back. At the moment the communication is being done with a rest
> > > call that is not part of the transaction and cannot be rolled back.
> > >
> > > Events produced within the transaction should be part of the
> > > transaction as well to avoid phantom events (events producing during
> > > workflow execution that are sent at the end of the unit of work)
> > >
> > > * Risk Assessment *
> > >
> > > The risks identified for this work are the following:
> > >
> > > Error handling can be problematic depending where we set the
> > > boundaries of the transaction. There are two different approaches:
> > >
> > > Boilerplate code for each task to start / commit / rollback the
> > > transaction and deail with error in the rest call tier itself
> > >
> > > Use the runtime environment to install error handling for doing the
> > > operation.
> > >
> > > Exchange information among systems in a non-transactional way. There
> > > are a couple of approaches
> > >
> > > Install every time a transaction sync listener whenever the rest call
> > > is made against the subsystem and doing a compensation when it fails
> > >
> > > Wrap the rest call in a XAResource that can be enlisted in the
> transaction.
> > >
> > > The use of Kafka clients for stream that does not belong to the
> > > transactions
> > >
> > > Wrap with XAResource (Kafka client support transactions, but does not
> > > offer XAresource)
> > >
> > > Install a transaction sync for each transaction.
> > >
> > > Performance impact with transactions.
> > >
> > > Different transaction methods in quarkus and spring boot
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> > > For additional commands, e-mail: dev-h...@kie.apache.org
>
>
>
> --
> Saludos, Enrique González Martínez :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> For additional commands, e-mail: dev-h...@kie.apache.org
>
>

Re: [PROPOSAL][WORKFLOW] Transaction in workflow runtime

Reply via email to