Re: [DISCUSSION] Performance issues with data-index persistence addon

Francisco Javier Tirado Sarti Mon, 19 Feb 2024 04:11:22 -0800

Hi Martin.
After taking a deeper look at this, I realize that the behaviour is the
expected one.
Runtimes DB does not track the completed process instance (that's what the
JDBCProcessInstances warn is telling us), but DataIndex, as expected, is
tracking it in processes and nodes table. And yes it will grow over time.
What we need is some configurable purge mechanism for DataIndex, so it
eventually removes older completed process instances.


On Tue, Feb 13, 2024 at 12:59 PM Francisco Javier Tirado Sarti <
[email protected]> wrote:

> Hi Martin,
> Good catch!. Looks like the skipping performed for process instances is
> not applied to node instances. Something we definitely need to review on
> the runtimes side.
>
> On Mon, Feb 12, 2024 at 11:59 PM Martin Weiler <[email protected]>
> wrote:
>
>> On a somewhat related note, testing a simple workflow (start -> script
>> node -> end), I see the following messages in the logs:
>> 2024-02-12 22:49:50,493 28758dde544c WARN
>> [org.kie.kogito.persistence.jdbc.JDBCProcessInstances:-1]
>> (executor-thread-3) Skipping create of process instance id:
>> 7083088e-b899-47cb-b85c-5d9ccb0aa166, state: 2
>>
>> So far, so good. And I'd expect to see no trace of this process in the
>> database if I don't have data audit enabled.
>>
>> However, the 'processes' table contains a row with state=2, with related
>> entries in the 'nodes' table. In a load test, I see these tables grow
>> significantly over time. Am I missing something to have these entries
>> cleaned up automatically?
>>
>> ________________________________________
>> From: Martin Weiler <[email protected]>
>> Sent: Monday, February 12, 2024 3:40 PM
>> To: [email protected]
>> Subject: [EXTERNAL] RE: [DISCUSSION] Performance issues with data-index
>> persistence addon
>>
>> Thanks everyone for your input. Based on this discussion, I opened the
>> following PR:
>> https://github.com/apache/incubator-kie-kogito-apps/pull/1985
>>
>> With this change, the performance seems to be stable over time:
>>
>> https://drive.google.com/file/d/1zkullvfrJpRp7TRjxDa41ok6kEIR7Fty/view?usp=sharing
>>
>> Martin
>>
>> ________________________________________
>> From: Gonzalo Muñoz <[email protected]>
>> Sent: Friday, February 9, 2024 9:42 AM
>> To: [email protected]
>> Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with data-index
>> persistence addon
>>
>> Great work Francisco,
>> Martin, take a look at this link with some related tips (in case you find
>> it useful):
>> https://www.cybertec-postgresql.com/en/index-your-foreign-key/
>>
>> El vie, 9 feb 2024 a las 17:20, Francisco Javier Tirado Sarti (<
>> [email protected]>) escribió:
>>
>> > For the moment being, we will keep JPA till we exhaust all
>> possibilities,
>> > let's call switching from jpa to jdbc our hidden plan B ;)
>> > I already told Martin, but in order everyone to know, just after writing
>> > the previous email, I thought "what if Postgres is not automatically
>> > indexing foreign keys like mysql?" and, eureka
>> > Postgres doc
>> > https://www.postgresql.org/docs/current/ddl-constraints.html
>> > Mysql doc
>> > https://dev.mysql.com/doc/refman/8.0/en/constraint-foreign-key.html
>> > These are the relevant excerpt
>> >
>> > *Postgresql*
>> > *A foreign key must reference columns that either are a primary key or
>> form
>> > a unique constraint, or are columns from a non-partial unique index.
>> This
>> > means that the referenced columns always have an index to allow
>> efficient
>> > lookups on whether a referencing row has a match. Since a DELETE of a
>> row
>> > from the referenced table or an UPDATE of a referenced column will
>> require
>> > a scan of the referencing table for rows matching the old value, it is
>> > often a good idea to index the referencing columns too. Because this is
>> not
>> > always needed, and there are many choices available on how to index, the
>> > declaration of a foreign key constraint does not automatically create an
>> > index on the referencing columns.*
>> > *Mysql*
>> > *MySQL requires that foreign key columns be indexed; if you create a
>> table
>> > with a foreign key constraint but no index on a given column, an index
>> is
>> > created. *
>> >
>> > So I asked Martin to especially create an index for process_instance_id
>> > column on nodes table
>> > I think that will fix the problem detected on the thread dump.
>> > The simpler process test to verify queries are fine still stands,
>> though ;)
>> >
>> >
>> > On Fri, Feb 9, 2024 at 5:10 PM Tibor Zimányi <[email protected]>
>> wrote:
>> >
>> > > I always preferred pure JDBC over Hibernate myself, just for the sake
>> of
>> > > control of what is happening :) So I would not -1 that myself.
>> > >
>> > > Tibor
>> > >
>> > > Dňa pi 9. 2. 2024, 17:00 Francisco Javier Tirado Sarti <
>> > > [email protected]>
>> > > napísal(a):
>> > >
>> > > > Hi,
>> > > > Usually I do not want to talk about work in progress because
>> > preliminary
>> > > > conclusions are pretty volatile but, well, there are a couple of
>> things
>> > > > that can be concluded from the really valuable information that
>> Martin
>> > > > provided.
>> > > > 1) In order to be able to determine if the number of statements is
>> > larger
>> > > > than expected, I asked Martin to test with a simpler process
>> > definition.
>> > > > One with just three nodes: start, script and end. The script one
>> should
>> > > > change just one variable. This way we can analyze if the number of
>> > > queries
>> > > > is the expected one. From the single log (audit was activated them)
>> my
>> > > > conclusion is that the number of insert/updates over processes and
>> > nodes
>> > > > (there a lot over task, that I will prefer to skip for now, baby
>> steps)
>> > > is
>> > > > the expected one.
>> > > > 2) Analysing the thread dump, we see around 15 threads executing
>> this
>> > > line
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.kie.kogito.index.jpa.storage.ProcessInstanceEntityStorage.indexNode(ProcessInstanceEntityStorage.java:125),
>> > > > so its pretty clear the code to be optimized ;). I'm evaluating
>> > > > possibilities within JPA/Hibernate, but I'm starting to think that
>> it
>> > > might
>> > > > be better to switch to JDBC and skip hibernate. Our lives will be
>> > > simpler,
>> > > > especially with a schema relatively simple like ours (that will be
>> my
>> > > > recommendation if I was an external consultant)
>> > > >
>> > > > On Fri, Feb 9, 2024 at 4:15 PM Tibor Zimányi <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > this will be a bit off-topic. However as far as performance, I
>> think
>> > we
>> > > > > should think about that we have string primary keys (IDs). I would
>> > > expect
>> > > > > the database systems are much better with indexing numeric keys
>> than
>> > > > > strings. I remember from the past, when I was working with DBs,
>> that
>> > > > using
>> > > > > strings as keys or indexes was a discouraged practice.
>> > > > >
>> > > > > Best regards,
>> > > > > Tibor
>> > > > >
>> > > > > Dňa št 8. 2. 2024, 22:45 Martin Weiler <[email protected]>
>> > > > > napísal(a):
>> > > > >
>> > > > > > I changed the test to use MongoDB [1] and I don't see a
>> performance
>> > > > > > degradation with this setup [2].
>> > > > > >
>> > > > > > Please keep us posted of your findings. Thanks!
>> > > > > >
>> > > > > > Martin
>> > > > > >
>> > > > > > [1]
>> > > > >
>> > https://github.com/martinweiler/job-service-refactor-test/tree/mongodb
>> > > > > > [2]
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/file/d/1NfacXaxJlgRMw4OQ5S20cvkzvaUKUVFj/view?usp=sharing
>> > > > > >
>> > > > > > ________________________________________
>> > > > > > From: Francisco Javier Tirado Sarti <[email protected]>
>> > > > > > Sent: Wednesday, February 7, 2024 11:40 AM
>> > > > > > To: [email protected]
>> > > > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with
>> > > data-index
>> > > > > > persistence addon
>> > > > > >
>> > > > > > yes, it can be index degradation because of size, but I believe
>> (I
>> > > > might
>> > > > > be
>> > > > > > wrong) the db is too small (yet) for that.
>> > > > > > But, eventually, Postgres, when the DB is huge enough,
>> unavoidably
>> > > will
>> > > > > > behave like the graphic that Martin sent.
>> > > > > > Since I believe we are not huge enough (yet), lets rule out
>> another
>> > > > issue
>> > > > > > by analysing the sql logs (I requested those to Martin offline
>> and
>> > he
>> > > > is
>> > > > > > going to kindly collect them).
>> > > > > > Also Im curious to know if Mongo behave in the same way.
>> > > > > >
>> > > > > > On Wed, Feb 7, 2024 at 7:25 PM Enrique Gonzalez Martinez <
>> > > > > > [email protected]> wrote:
>> > > > > >
>> > > > > > > Hi Francisco,
>> > > > > > > I would highly recommend to check indexes and how the updates
>> > work
>> > > in
>> > > > > > data
>> > > > > > > index to avoid full scan table and lock the full table. Some
>> db
>> > are
>> > > > > very
>> > > > > > > sensitive to that.
>> > > > > > >
>> > > > > > > El mié, 7 feb 2024, 18:41, Francisco Javier Tirado Sarti <
>> > > > > > > [email protected]> escribió:
>> > > > > > >
>> > > > > > > > Hi Martin,
>> > > > > > > > While I analyze the data, let me ask you if it is possible
>> to
>> > > > perform
>> > > > > > > > another check (similar in a way to disabling data-index like
>> > you
>> > > > do)
>> > > > > > Can
>> > > > > > > > you switch to MongoDB persistence and check if the same
>> > > degradation
>> > > > > > that
>> > > > > > > is
>> > > > > > > > there for postgres remains?
>> > > > > > > > I do not know if this is feasible but will certainly
>> indicate
>> > the
>> > > > > > problem
>> > > > > > > > is on the postgres storage layer and I do not have a clear
>> > > > prediction
>> > > > > > of
>> > > > > > > > what we will see when doing this switch.
>> > > > > > > >
>> > > > > > > > On Wed, Feb 7, 2024 at 6:37 PM Martin Weiler
>> > > > <[email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Francisco,
>> > > > > > > > >
>> > > > > > > > > thanks for your work on this important topic!
>> > > > > > > > >
>> > > > > > > > > I would like to share some test results here, which might
>> > help
>> > > to
>> > > > > > > improve
>> > > > > > > > > the codebase even further. I am using the jmeter based
>> test
>> > > case
>> > > > > from
>> > > > > > > > Pere
>> > > > > > > > > and Enrique (thanks guys!) [1] which uses a load of 30
>> > threads
>> > > to
>> > > > > > > > >
>> > > > > > > > > 1) start a new process instance (POST)
>> > > > > > > > > 2) retrieve tasks for a user (GET)
>> > > > > > > > > 3) fetches task details (GET)
>> > > > > > > > > 4) complete a task (POST)
>> > > > > > > > > 5) execute a query on data-audit
>> > > > > > > > >
>> > > > > > > > > With this test setup, I noticed that the performance for
>> the
>> > > POST
>> > > > > > > > > requests, in particular the one to start a new process
>> > > instance,
>> > > > > > > degrades
>> > > > > > > > > over time - see graph [2]. If I run the same test without
>> > > > > data-index,
>> > > > > > > > then
>> > > > > > > > > there is no such performance degradation [3]. You can
>> find a
>> > > > thread
>> > > > > > > dump
>> > > > > > > > > captured a few minutes into the first test here [4] that
>> > might
>> > > > help
>> > > > > > to
>> > > > > > > > see
>> > > > > > > > > some of the contention points.
>> > > > > > > > >
>> > > > > > > > > I'd appreciate if you could take a look and see if there
>> is
>> > > > > something
>> > > > > > > > that
>> > > > > > > > > can be further improved based on your previous work. If
>> you
>> > > need
>> > > > > any
>> > > > > > > > > additional data, let me know, but otherwise it is
>> > > straightforward
>> > > > > to
>> > > > > > > run
>> > > > > > > > > the jmeter test as well.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Martin
>> > > > > > > > >
>> > > > > > > > > [1]
>> https://github.com/pefernan/job-service-refactor-test/
>> > > > > > > > > [2]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/file/d/1Gqn-ixE05kXv2jdssAUlnMuUVcHxIYZ0/view?usp=sharing
>> > > > > > > > > [3]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/file/d/10gVNyb4JYg_bA18bNhY9dEDbPn3TOxL7/view?usp=sharing
>> > > > > > > > > [4]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/file/d/1jVrtsO49gCvUlnaC9AUAtkVKTm4PbdUv/view?usp=sharing
>> > > > > > > > >
>> > > > > > > > > ________________________________________
>> > > > > > > > > From: Francisco Javier Tirado Sarti <[email protected]>
>> > > > > > > > > Sent: Wednesday, January 17, 2024 9:13 AM
>> > > > > > > > > To: [email protected]
>> > > > > > > > > Cc: Pere Fernandez Perez
>> > > > > > > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues
>> with
>> > > > > > data-index
>> > > > > > > > > persistence addon
>> > > > > > > > >
>> > > > > > > > > Hi Alex,
>> > > > > > > > > I did not take times (which depends on a number of
>> variables
>> > > that
>> > > > > > > > > drastically change between environments), but verify that
>> the
>> > > > > number
>> > > > > > of
>> > > > > > > > > updates has been reduced drastically without losing
>> > > > functionality,
>> > > > > > > which
>> > > > > > > > is
>> > > > > > > > > objectively a good thing. If before the change, for every
>> > node
>> > > > > > > executed,
>> > > > > > > > we
>> > > > > > > > > have an update for every node previously executed, so if a
>> > > > process
>> > > > > > have
>> > > > > > > > 50
>> > > > > > > > > nodes to execute, we were performing nearly 50*51/2
>> updates,
>> > > > which
>> > > > > > > gives
>> > > > > > > > us
>> > > > > > > > > a total of  1275 updates, now we have just one for every
>> node
>> > > > being
>> > > > > > > > > executed, implying a total of 50 updates.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Wed, Jan 17, 2024 at 3:18 PM Alex Porcelli <
>> > > [email protected]>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Francisco,
>> > > > > > > > > >
>> > > > > > > > > > I noticed that your PR has been merged, but I was
>> expecting
>> > > (at
>> > > > > > least
>> > > > > > > > > > was my understanding from this thread) that before
>> merging
>> > > some
>> > > > > > > > > > benchmark data would be shared in advance - to assess
>> the
>> > > > > > > cost/benefit
>> > > > > > > > > > of such a decent size change.
>> > > > > > > > > >
>> > > > > > > > > > Do you have any information to share?
>> > > > > > > > > >
>> > > > > > > > > > On Sat, Dec 23, 2023 at 4:02 AM Francisco Javier Tirado
>> > Sarti
>> > > > > > > > > > <[email protected]> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Yes, as intended, now we have one select and one
>> > > > insert/update
>> > > > > > per
>> > > > > > > > node
>> > > > > > > > > > > event.
>> > > > > > > > > > > I moved the PR as ready for review and give @Pere
>> > Fernandez
>> > > > > Perez
>> > > > > > > > > > > <[email protected]> permission to the branch so he
>> can
>> > > > edit
>> > > > > it
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > > next two weeks (Ill be on PTO)  if desired, before
>> > merging.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, Dec 21, 2023 at 5:58 PM Alex Porcelli <
>> > > > > [email protected]>
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Cool, thank you Francisco!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Did you manage to get some preliminary data about
>> > > > > improvements?
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Thu, Dec 21, 2023 at 11:52 AM Francisco Javier
>> > Tirado
>> > > > > Sarti
>> > > > > > > > > > > > <[email protected]> wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Yes, after some delay because of quarkus 3
>> migration.
>> > > Im
>> > > > > > > refining
>> > > > > > > > > > this
>> > > > > > > > > > > > > draft PR
>> > > > > > > > > > > > >
>> > > > > > https://github.com/apache/incubator-kie-kogito-apps/pull/1941
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Thu, Dec 21, 2023 at 5:48 PM Alex Porcelli <
>> > > > > > > [email protected]>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Any update or new findings on this topic?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Nov 28, 2023 at 8:38 AM Francisco Javier
>> > > Tirado
>> > > > > > Sarti
>> > > > > > > > > > > > > > <[email protected]> wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi Alex,
>> > > > > > > > > > > > > > > After considering different options to improve
>> > > > > > performance,
>> > > > > > > > we
>> > > > > > > > > > feel
>> > > > > > > > > > > > that
>> > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > is time to "partially" move away from the
>> current
>> > > Map
>> > > > > > style
>> > > > > > > > > > > > interface (
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-kie-kogito-apps/blob/main/persistence-commons/persistence-commons-api/src/main/java/org/kie/kogito/persistence/api/Storage.java
>> > > > > > > > > > > > > > )
>> > > > > > > > > > > > > > > which was shared with Trusty, to one more
>> > suitable
>> > > > for
>> > > > > > > usage
>> > > > > > > > > > with a
>> > > > > > > > > > > > > > > relational DB like postgresql (but still
>> > compatible
>> > > > > with
>> > > > > > > big
>> > > > > > > > > > table
>> > > > > > > > > > > > dbs).
>> > > > > > > > > > > > > > > The idea will be to replace generic Storage
>> > > interface
>> > > > > by
>> > > > > > > four
>> > > > > > > > > > > > specific
>> > > > > > > > > > > > > > > interfaces (which will inherit from a common
>> one
>> > > that
>> > > > > > keeps
>> > > > > > > > the
>> > > > > > > > > > query
>> > > > > > > > > > > > > > part
>> > > > > > > > > > > > > > > at is it. with get and query methods), that
>> will
>> > > > > include
>> > > > > > > the
>> > > > > > > > > > required
>> > > > > > > > > > > > > > > modification operations for the four DataIndex
>> > > > > "domains":
>> > > > > > > > > > > > > > processinstance,
>> > > > > > > > > > > > > > > usertask, processdefinitions and jobs. Those
>> > > > interfaces
>> > > > > > > will
>> > > > > > > > > > define
>> > > > > > > > > > > > > > methods
>> > > > > > > > > > > > > > > like addNode, addVariable, updateTask,
>> > > > > addAttachment.....
>> > > > > > > > that
>> > > > > > > > > > will
>> > > > > > > > > > > > allow
>> > > > > > > > > > > > > > > the persistent layer implementation  to just
>> > update
>> > > > the
>> > > > > > > > needed
>> > > > > > > > > > info
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > DB  (for example, for addNode in Postgres,
>> just
>> > > > insert
>> > > > > a
>> > > > > > > row
>> > > > > > > > > into
>> > > > > > > > > > > > nodes
>> > > > > > > > > > > > > > > table, for addNode in Mongo, basically the
>> same
>> > > > atomic
>> > > > > > > upsert
>> > > > > > > > > > > > operation
>> > > > > > > > > > > > > > > that is currently done). Therefore, we
>> increase
>> > > > > > performance
>> > > > > > > > for
>> > > > > > > > > > > > Postgres
>> > > > > > > > > > > > > > > and keep the current one for Mongo. The
>> current
>> > DB
>> > > > > > schemas
>> > > > > > > > > won't
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > touched.
>> > > > > > > > > > > > > > > Since the code change is large, I do not think
>> > I'll
>> > > > be
>> > > > > > able
>> > > > > > > > to
>> > > > > > > > > > have
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > PR
>> > > > > > > > > > > > > > > ready till next week.
>> > > > > > > > > > > > > > > But before starting, please let me know if
>> that
>> > > > > approach
>> > > > > > is
>> > > > > > > > > fine
>> > > > > > > > > > for
>> > > > > > > > > > > > you.
>> > > > > > > > > > > > > > > Best regards.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 6:55 PM Alex Porcelli
>> <
>> > > > > > > > > [email protected]>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thank you Francisco to getting deeper on
>> this…
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Looking forward to see the results of your
>> > > > suggested
>> > > > > > > > > > improvements.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 9:40 AM Francisco
>> > Javier
>> > > > > Tirado
>> > > > > > > > > Sarti <
>> > > > > > > > > > > > > > > > [email protected]> wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I forgot to attach the queries
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 3:04 PM Francisco
>> > > Javier
>> > > > > > Tirado
>> > > > > > > > > > Sarti <
>> > > > > > > > > > > > > > > > > [email protected]> wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >> Hi,
>> > > > > > > > > > > > > > > > >> A brief update on this topic.
>> > > > > > > > > > > > > > > > >> After doing a simple test with example
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-kie-kogito-examples/tree/stable/serverless-workflow-examples/serverless-workflow-data-index-quarkus
>> > > > > > > > > > > > > > > > ,
>> > > > > > > > > > > > > > > > >> the number of updates over Nodes table is
>> > n*n,
>> > > > so
>> > > > > we
>> > > > > > > > > manage
>> > > > > > > > > > to
>> > > > > > > > > > > > > > obtain a
>> > > > > > > > > > > > > > > > >> perfect quadratic performance
>> degradation.
>> > The
>> > > > > > problem
>> > > > > > > > is
>> > > > > > > > > > worse
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > case
>> > > > > > > > > > > > > > > > >> of Serverless Workflow than in BPMN
>> because
>> > we
>> > > > the
>> > > > > > > > number
>> > > > > > > > > of
>> > > > > > > > > > > > nodes
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > >> greater than the number of states. In
>> that
>> > > > > example N
>> > > > > > > is
>> > > > > > > > > 16,
>> > > > > > > > > > but
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > a
>> > > > > > > > > > > > > > > > more
>> > > > > > > > > > > > > > > > >> complex workflow it would be certainly
>> > large.
>> > > > > > > > > > > > > > > > >> I think that this is more related to how
>> we
>> > > are
>> > > > > > > handling
>> > > > > > > > > > JPA in
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > code,
>> > > > > > > > > > > > > > > > >> in particular the mapping from model to
>> > entity
>> > > > > > > > (basically
>> > > > > > > > > > JPA is
>> > > > > > > > > > > > > > blind
>> > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > >> has to update all nodes for every write
>> > > because
>> > > > it
>> > > > > > > > > believes
>> > > > > > > > > > the
>> > > > > > > > > > > > > > node has
>> > > > > > > > > > > > > > > > >> been updated, although it is not) than an
>> > > issue
>> > > > in
>> > > > > > the
>> > > > > > > > > table
>> > > > > > > > > > > > > > definition.
>> > > > > > > > > > > > > > > > >> In fact, when using JPA, separating the
>> > server
>> > > > > model
>> > > > > > > > from
>> > > > > > > > > > the
>> > > > > > > > > > > > JPA
>> > > > > > > > > > > > > > > > entity is
>> > > > > > > > > > > > > > > > >> not a good idea, especially if the entity
>> > > > contains
>> > > > > > > > > > collections.
>> > > > > > > > > > > > I
>> > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > try
>> > > > > > > > > > > > > > > > >> to change that without breaking anything.
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> On Wed, Nov 22, 2023 at 12:10 PM Enrique
>> > > > Gonzalez
>> > > > > > > > > Martinez <
>> > > > > > > > > > > > > > > > >> [email protected]> wrote:
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >>> After the events split you now will
>> need to
>> > > > > create
>> > > > > > a
>> > > > > > > > node
>> > > > > > > > > > > > instance
>> > > > > > > > > > > > > > > > >>> model instance of making independent
>> from
>> > the
>> > > > > > process
>> > > > > > > > > > instance.
>> > > > > > > > > > > > > > > > >>> That should do the trick.
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > > >>> Regarding deleting/inserting it was
>> fixed
>> > at
>> > > > some
>> > > > > > > > point.
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > > >>> El mar, 21 nov 2023 a las 20:22,
>> Francisco
>> > > > Javier
>> > > > > > > > Tirado
>> > > > > > > > > > Sarti
>> > > > > > > > > > > > > > > > >>> (<[email protected]>) escribió:
>> > > > > > > > > > > > > > > > >>> >
>> > > > > > > > > > > > > > > > >>> > Hi Martin,
>> > > > > > > > > > > > > > > > >>> > I have a task to review performance of
>> > > > > > > > > > > > > > > > >>> >
>> > > > > > > > > > > > > > > > >>> > ProcessInstanceNodeDataEventMerger
>> > > > > > > > > > > > > > > > >>> > My idea is to reduce the number of
>> delete
>> > > > > inserts
>> > > > > > > > when
>> > > > > > > > > > > > processing
>> > > > > > > > > > > > > > > > >>> events
>> > > > > > > > > > > > > > > > >>> > and try to do it incremental.
>> > > > > > > > > > > > > > > > >>> > That should improve performance.
>> > > > > > > > > > > > > > > > >>> > PS:
>> > > > > > > > > > > > > > > > >>> > I was planning to send an e-mail
>> tomorrow
>> > > > > > > announcing
>> > > > > > > > > > that in
>> > > > > > > > > > > > > > case you
>> > > > > > > > > > > > > > > > >>> were
>> > > > > > > > > > > > > > > > >>> > already working on a fix for that. I
>> > assume
>> > > > you
>> > > > > > are
>> > > > > > > > not
>> > > > > > > > > > and I
>> > > > > > > > > > > > > > would
>> > > > > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > >>> > sending a PR soon.
>> > > > > > > > > > > > > > > > >>> >
>> > > > > > > > > > > > > > > > >>> > On Tue, Nov 21, 2023 at 6:09 PM Martin
>> > > Weiler
>> > > > > > > > > > > > > > > > <[email protected]
>> > > > > > > > > > > > > > > > >>> >
>> > > > > > > > > > > > > > > > >>> > wrote:
>> > > > > > > > > > > > > > > > >>> >
>> > > > > > > > > > > > > > > > >>> > > I looked into the new examples using
>> > > > > data-index
>> > > > > > > > > > persistence
>> > > > > > > > > > > > > > addon -
>> > > > > > > > > > > > > > > > >>> Neus'
>> > > > > > > > > > > > > > > > >>> > > PR#1813 [1] for serverless and
>> Pere's
>> > > > branch
>> > > > > > [2]
>> > > > > > > > for
>> > > > > > > > > > > > workflow
>> > > > > > > > > > > > > > > > (great
>> > > > > > > > > > > > > > > > >>> job
>> > > > > > > > > > > > > > > > >>> > > both!) - and they work without
>> issues
>> > > using
>> > > > > > > single
>> > > > > > > > > > > > requests.
>> > > > > > > > > > > > > > > > >>> However, under
>> > > > > > > > > > > > > > > > >>> > > some load (I used 'ab' for testing
>> > with a
>> > > > > light
>> > > > > > > > > > > > concurrency of
>> > > > > > > > > > > > > > 10
>> > > > > > > > > > > > > > > > >>> parallel
>> > > > > > > > > > > > > > > > >>> > > requests) I ran into the following
>> > > > problems:
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > (1) Large number of insert/delete
>> calls
>> > > > (eg.
>> > > > > > for
>> > > > > > > > > tables
>> > > > > > > > > > > > such as
>> > > > > > > > > > > > > > > > >>> nodes,
>> > > > > > > > > > > > > > > > >>> > > definitions, etc)
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > (2) Hibernate
>> OptimisticLockExceptions
>> > /
>> > > > > > > > > > > > StaleStateExceptions
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > (3) DB deadlocks
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > (4) Error responses, slow response
>> > times
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > The reason I am reaching out with
>> this
>> > > > topic
>> > > > > > here
>> > > > > > > > is
>> > > > > > > > > to
>> > > > > > > > > > > > find
>> > > > > > > > > > > > > > out if
>> > > > > > > > > > > > > > > > >>> we are
>> > > > > > > > > > > > > > > > >>> > > aware of this issue, and if someone
>> is
>> > > > > already
>> > > > > > > > > looking
>> > > > > > > > > > > > into or
>> > > > > > > > > > > > > > > > being
>> > > > > > > > > > > > > > > > >>> > > assigned to it?
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > Thanks,
>> > > > > > > > > > > > > > > > >>> > > Martin
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > > [1]
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > >
>> > > > > > >
>> > https://github.com/apache/incubator-kie-kogito-examples/pull/1813
>> > > > > > > > > > > > > > > > >>> > > [2]
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/pefernan/kogito-examples/tree/example_data-index_persistence
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > > > > > > > > > > > > > >>> > > To unsubscribe, e-mail:
>> > > > > > > > > [email protected]
>> > > > > > > > > > > > > > > > >>> > > For additional commands, e-mail:
>> > > > > > > > > > [email protected]
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>> > >
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > > > > > > > > > > > >>> To unsubscribe, e-mail:
>> > > > > > > [email protected]
>> > > > > > > > > > > > > > > > >>> For additional commands, e-mail:
>> > > > > > > > [email protected]
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > > >>>
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > > > > > > > > > > > > > > To unsubscribe, e-mail:
>> > > > > > [email protected]
>> > > > > > > > > > > > > > > > > For additional commands, e-mail:
>> > > > > > > [email protected]
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > > > > > > > > > To unsubscribe, e-mail:
>> > > [email protected]
>> > > > > > > > > > > > > > For additional commands, e-mail:
>> > > > [email protected]
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > > > > > > > > > To unsubscribe, e-mail:
>> [email protected]
>> > > > > > > > > > > > For additional commands, e-mail:
>> > [email protected]
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > > > > > To unsubscribe, e-mail: [email protected]
>> > > > > > > > > > For additional commands, e-mail:
>> [email protected]
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > > > > > To unsubscribe, e-mail: [email protected]
>> > > > > > > > > For additional commands, e-mail: [email protected]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Re: [DISCUSSION] Performance issues with data-index persistence addon

Reply via email to