Re: [DISCUSSION] Performance issues with data-index persistence addon

Enrique Gonzalez Martinez Mon, 19 Feb 2024 05:25:43 -0800

Alex:
Right now the data index is working in the same way as it did in v7 with
the emitters. The only difference between two impl is that in here the
storage is pgsql instead elastic search.  You are right regarding is a
snapshot of the last state of the process but we did never define how long
would be alive that dats Honestly i am happy right now with the way it
works. The clean up mechanism is still tbd because we still need to discuss
other stuff first.



Regarding stp is to leave no trail because u can get the outcome directly
from the call. It was defined like that in v7. So there is no use for the
index or the audit.

El lun, 19 feb 2024, 14:13, Francisco Javier Tirado Sarti <
[email protected]> escribió:

> Hi Alex,
> There has been some confusion about the purpose of DataIndex. To be honest
> I believe they were already sorted out, but your e-mail makes me think that
> is not the case ;). I let Kris to clarify that with you. My view is that
> data-index is a way to query recently closed and active processes (the key
> here is the definition of recently, which in my opinion should be
> configurable)
> But, besides that discussion and being pragmatic, keeping finishing process
> instances "for a while" in DataIndex was the only way for users to query
> the result of straight through processes. That's a function that cannot be
> removed right now
>
> On Mon, Feb 19, 2024 at 1:33 PM Alex Porcelli <[email protected]> wrote:
>
> > if data index was supposed to provide snapshot view of the process
> > instance… why do we keep it after the process instance is finished?
> >
> >
> > On Mon, Feb 19, 2024 at 7:12 AM Francisco Javier Tirado Sarti <
> > [email protected]> wrote:
> >
> > > Hi Martin.
> > > After taking a deeper look at this, I realize that the behaviour is the
> > > expected one.
> > > Runtimes DB does not track the completed process instance (that's what
> > the
> > > JDBCProcessInstances warn is telling us), but DataIndex, as expected,
> is
> > > tracking it in processes and nodes table. And yes it will grow over
> time.
> > > What we need is some configurable purge mechanism for DataIndex, so it
> > > eventually removes older completed process instances.
> > >
> > > On Tue, Feb 13, 2024 at 12:59 PM Francisco Javier Tirado Sarti <
> > > [email protected]> wrote:
> > >
> > > > Hi Martin,
> > > > Good catch!. Looks like the skipping performed for process instances
> is
> > > > not applied to node instances. Something we definitely need to review
> > on
> > > > the runtimes side.
> > > >
> > > > On Mon, Feb 12, 2024 at 11:59 PM Martin Weiler
> <[email protected]
> > >
> > > > wrote:
> > > >
> > > >> On a somewhat related note, testing a simple workflow (start ->
> script
> > > >> node -> end), I see the following messages in the logs:
> > > >> 2024-02-12 22:49:50,493 28758dde544c WARN
> > > >> [org.kie.kogito.persistence.jdbc.JDBCProcessInstances:-1]
> > > >> (executor-thread-3) Skipping create of process instance id:
> > > >> 7083088e-b899-47cb-b85c-5d9ccb0aa166, state: 2
> > > >>
> > > >> So far, so good. And I'd expect to see no trace of this process in
> the
> > > >> database if I don't have data audit enabled.
> > > >>
> > > >> However, the 'processes' table contains a row with state=2, with
> > related
> > > >> entries in the 'nodes' table. In a load test, I see these tables
> grow
> > > >> significantly over time. Am I missing something to have these
> entries
> > > >> cleaned up automatically?
> > > >>
> > > >> ________________________________________
> > > >> From: Martin Weiler <[email protected]>
> > > >> Sent: Monday, February 12, 2024 3:40 PM
> > > >> To: [email protected]
> > > >> Subject: [EXTERNAL] RE: [DISCUSSION] Performance issues with
> > data-index
> > > >> persistence addon
> > > >>
> > > >> Thanks everyone for your input. Based on this discussion, I opened
> the
> > > >> following PR:
> > > >> https://github.com/apache/incubator-kie-kogito-apps/pull/1985
> > > >>
> > > >> With this change, the performance seems to be stable over time:
> > > >>
> > > >>
> > >
> >
> https://drive.google.com/file/d/1zkullvfrJpRp7TRjxDa41ok6kEIR7Fty/view?usp=sharing
> > > >>
> > > >> Martin
> > > >>
> > > >> ________________________________________
> > > >> From: Gonzalo Muñoz <[email protected]>
> > > >> Sent: Friday, February 9, 2024 9:42 AM
> > > >> To: [email protected]
> > > >> Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with
> > data-index
> > > >> persistence addon
> > > >>
> > > >> Great work Francisco,
> > > >> Martin, take a look at this link with some related tips (in case you
> > > find
> > > >> it useful):
> > > >> https://www.cybertec-postgresql.com/en/index-your-foreign-key/
> > > >>
> > > >> El vie, 9 feb 2024 a las 17:20, Francisco Javier Tirado Sarti (<
> > > >> [email protected]>) escribió:
> > > >>
> > > >> > For the moment being, we will keep JPA till we exhaust all
> > > >> possibilities,
> > > >> > let's call switching from jpa to jdbc our hidden plan B ;)
> > > >> > I already told Martin, but in order everyone to know, just after
> > > writing
> > > >> > the previous email, I thought "what if Postgres is not
> automatically
> > > >> > indexing foreign keys like mysql?" and, eureka
> > > >> > Postgres doc
> > > >> > https://www.postgresql.org/docs/current/ddl-constraints.html
> > > >> > Mysql doc
> > > >> >
> https://dev.mysql.com/doc/refman/8.0/en/constraint-foreign-key.html
> > > >> > These are the relevant excerpt
> > > >> >
> > > >> > *Postgresql*
> > > >> > *A foreign key must reference columns that either are a primary
> key
> > or
> > > >> form
> > > >> > a unique constraint, or are columns from a non-partial unique
> index.
> > > >> This
> > > >> > means that the referenced columns always have an index to allow
> > > >> efficient
> > > >> > lookups on whether a referencing row has a match. Since a DELETE
> of
> > a
> > > >> row
> > > >> > from the referenced table or an UPDATE of a referenced column will
> > > >> require
> > > >> > a scan of the referencing table for rows matching the old value,
> it
> > is
> > > >> > often a good idea to index the referencing columns too. Because
> this
> > > is
> > > >> not
> > > >> > always needed, and there are many choices available on how to
> index,
> > > the
> > > >> > declaration of a foreign key constraint does not automatically
> > create
> > > an
> > > >> > index on the referencing columns.*
> > > >> > *Mysql*
> > > >> > *MySQL requires that foreign key columns be indexed; if you
> create a
> > > >> table
> > > >> > with a foreign key constraint but no index on a given column, an
> > index
> > > >> is
> > > >> > created. *
> > > >> >
> > > >> > So I asked Martin to especially create an index for
> > > process_instance_id
> > > >> > column on nodes table
> > > >> > I think that will fix the problem detected on the thread dump.
> > > >> > The simpler process test to verify queries are fine still stands,
> > > >> though ;)
> > > >> >
> > > >> >
> > > >> > On Fri, Feb 9, 2024 at 5:10 PM Tibor Zimányi <[email protected]
> >
> > > >> wrote:
> > > >> >
> > > >> > > I always preferred pure JDBC over Hibernate myself, just for the
> > > sake
> > > >> of
> > > >> > > control of what is happening :) So I would not -1 that myself.
> > > >> > >
> > > >> > > Tibor
> > > >> > >
> > > >> > > Dňa pi 9. 2. 2024, 17:00 Francisco Javier Tirado Sarti <
> > > >> > > [email protected]>
> > > >> > > napísal(a):
> > > >> > >
> > > >> > > > Hi,
> > > >> > > > Usually I do not want to talk about work in progress because
> > > >> > preliminary
> > > >> > > > conclusions are pretty volatile but, well, there are a couple
> of
> > > >> things
> > > >> > > > that can be concluded from the really valuable information
> that
> > > >> Martin
> > > >> > > > provided.
> > > >> > > > 1) In order to be able to determine if the number of
> statements
> > is
> > > >> > larger
> > > >> > > > than expected, I asked Martin to test with a simpler process
> > > >> > definition.
> > > >> > > > One with just three nodes: start, script and end. The script
> one
> > > >> should
> > > >> > > > change just one variable. This way we can analyze if the
> number
> > of
> > > >> > > queries
> > > >> > > > is the expected one. From the single log (audit was activated
> > > them)
> > > >> my
> > > >> > > > conclusion is that the number of insert/updates over processes
> > and
> > > >> > nodes
> > > >> > > > (there a lot over task, that I will prefer to skip for now,
> baby
> > > >> steps)
> > > >> > > is
> > > >> > > > the expected one.
> > > >> > > > 2) Analysing the thread dump, we see around 15 threads
> executing
> > > >> this
> > > >> > > line
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.kie.kogito.index.jpa.storage.ProcessInstanceEntityStorage.indexNode(ProcessInstanceEntityStorage.java:125),
> > > >> > > > so its pretty clear the code to be optimized ;). I'm
> evaluating
> > > >> > > > possibilities within JPA/Hibernate, but I'm starting to think
> > that
> > > >> it
> > > >> > > might
> > > >> > > > be better to switch to JDBC and skip hibernate. Our lives will
> > be
> > > >> > > simpler,
> > > >> > > > especially with a schema relatively simple like ours (that
> will
> > be
> > > >> my
> > > >> > > > recommendation if I was an external consultant)
> > > >> > > >
> > > >> > > > On Fri, Feb 9, 2024 at 4:15 PM Tibor Zimányi <
> > [email protected]
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Hi,
> > > >> > > > >
> > > >> > > > > this will be a bit off-topic. However as far as
> performance, I
> > > >> think
> > > >> > we
> > > >> > > > > should think about that we have string primary keys (IDs). I
> > > would
> > > >> > > expect
> > > >> > > > > the database systems are much better with indexing numeric
> > keys
> > > >> than
> > > >> > > > > strings. I remember from the past, when I was working with
> > DBs,
> > > >> that
> > > >> > > > using
> > > >> > > > > strings as keys or indexes was a discouraged practice.
> > > >> > > > >
> > > >> > > > > Best regards,
> > > >> > > > > Tibor
> > > >> > > > >
> > > >> > > > > Dňa št 8. 2. 2024, 22:45 Martin Weiler
> > <[email protected]
> > > >
> > > >> > > > > napísal(a):
> > > >> > > > >
> > > >> > > > > > I changed the test to use MongoDB [1] and I don't see a
> > > >> performance
> > > >> > > > > > degradation with this setup [2].
> > > >> > > > > >
> > > >> > > > > > Please keep us posted of your findings. Thanks!
> > > >> > > > > >
> > > >> > > > > > Martin
> > > >> > > > > >
> > > >> > > > > > [1]
> > > >> > > > >
> > > >> >
> > > https://github.com/martinweiler/job-service-refactor-test/tree/mongodb
> > > >> > > > > > [2]
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://drive.google.com/file/d/1NfacXaxJlgRMw4OQ5S20cvkzvaUKUVFj/view?usp=sharing
> > > >> > > > > >
> > > >> > > > > > ________________________________________
> > > >> > > > > > From: Francisco Javier Tirado Sarti <[email protected]>
> > > >> > > > > > Sent: Wednesday, February 7, 2024 11:40 AM
> > > >> > > > > > To: [email protected]
> > > >> > > > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues
> with
> > > >> > > data-index
> > > >> > > > > > persistence addon
> > > >> > > > > >
> > > >> > > > > > yes, it can be index degradation because of size, but I
> > > believe
> > > >> (I
> > > >> > > > might
> > > >> > > > > be
> > > >> > > > > > wrong) the db is too small (yet) for that.
> > > >> > > > > > But, eventually, Postgres, when the DB is huge enough,
> > > >> unavoidably
> > > >> > > will
> > > >> > > > > > behave like the graphic that Martin sent.
> > > >> > > > > > Since I believe we are not huge enough (yet), lets rule
> out
> > > >> another
> > > >> > > > issue
> > > >> > > > > > by analysing the sql logs (I requested those to Martin
> > offline
> > > >> and
> > > >> > he
> > > >> > > > is
> > > >> > > > > > going to kindly collect them).
> > > >> > > > > > Also Im curious to know if Mongo behave in the same way.
> > > >> > > > > >
> > > >> > > > > > On Wed, Feb 7, 2024 at 7:25 PM Enrique Gonzalez Martinez <
> > > >> > > > > > [email protected]> wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Francisco,
> > > >> > > > > > > I would highly recommend to check indexes and how the
> > > updates
> > > >> > work
> > > >> > > in
> > > >> > > > > > data
> > > >> > > > > > > index to avoid full scan table and lock the full table.
> > Some
> > > >> db
> > > >> > are
> > > >> > > > > very
> > > >> > > > > > > sensitive to that.
> > > >> > > > > > >
> > > >> > > > > > > El mié, 7 feb 2024, 18:41, Francisco Javier Tirado
> Sarti <
> > > >> > > > > > > [email protected]> escribió:
> > > >> > > > > > >
> > > >> > > > > > > > Hi Martin,
> > > >> > > > > > > > While I analyze the data, let me ask you if it is
> > possible
> > > >> to
> > > >> > > > perform
> > > >> > > > > > > > another check (similar in a way to disabling
> data-index
> > > like
> > > >> > you
> > > >> > > > do)
> > > >> > > > > > Can
> > > >> > > > > > > > you switch to MongoDB persistence and check if the
> same
> > > >> > > degradation
> > > >> > > > > > that
> > > >> > > > > > > is
> > > >> > > > > > > > there for postgres remains?
> > > >> > > > > > > > I do not know if this is feasible but will certainly
> > > >> indicate
> > > >> > the
> > > >> > > > > > problem
> > > >> > > > > > > > is on the postgres storage layer and I do not have a
> > clear
> > > >> > > > prediction
> > > >> > > > > > of
> > > >> > > > > > > > what we will see when doing this switch.
> > > >> > > > > > > >
> > > >> > > > > > > > On Wed, Feb 7, 2024 at 6:37 PM Martin Weiler
> > > >> > > > <[email protected]
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Francisco,
> > > >> > > > > > > > >
> > > >> > > > > > > > > thanks for your work on this important topic!
> > > >> > > > > > > > >
> > > >> > > > > > > > > I would like to share some test results here, which
> > > might
> > > >> > help
> > > >> > > to
> > > >> > > > > > > improve
> > > >> > > > > > > > > the codebase even further. I am using the jmeter
> based
> > > >> test
> > > >> > > case
> > > >> > > > > from
> > > >> > > > > > > > Pere
> > > >> > > > > > > > > and Enrique (thanks guys!) [1] which uses a load of
> 30
> > > >> > threads
> > > >> > > to
> > > >> > > > > > > > >
> > > >> > > > > > > > > 1) start a new process instance (POST)
> > > >> > > > > > > > > 2) retrieve tasks for a user (GET)
> > > >> > > > > > > > > 3) fetches task details (GET)
> > > >> > > > > > > > > 4) complete a task (POST)
> > > >> > > > > > > > > 5) execute a query on data-audit
> > > >> > > > > > > > >
> > > >> > > > > > > > > With this test setup, I noticed that the performance
> > for
> > > >> the
> > > >> > > POST
> > > >> > > > > > > > > requests, in particular the one to start a new
> process
> > > >> > > instance,
> > > >> > > > > > > degrades
> > > >> > > > > > > > > over time - see graph [2]. If I run the same test
> > > without
> > > >> > > > > data-index,
> > > >> > > > > > > > then
> > > >> > > > > > > > > there is no such performance degradation [3]. You
> can
> > > >> find a
> > > >> > > > thread
> > > >> > > > > > > dump
> > > >> > > > > > > > > captured a few minutes into the first test here [4]
> > that
> > > >> > might
> > > >> > > > help
> > > >> > > > > > to
> > > >> > > > > > > > see
> > > >> > > > > > > > > some of the contention points.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I'd appreciate if you could take a look and see if
> > there
> > > >> is
> > > >> > > > > something
> > > >> > > > > > > > that
> > > >> > > > > > > > > can be further improved based on your previous work.
> > If
> > > >> you
> > > >> > > need
> > > >> > > > > any
> > > >> > > > > > > > > additional data, let me know, but otherwise it is
> > > >> > > straightforward
> > > >> > > > > to
> > > >> > > > > > > run
> > > >> > > > > > > > > the jmeter test as well.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > > Martin
> > > >> > > > > > > > >
> > > >> > > > > > > > > [1]
> > > >> https://github.com/pefernan/job-service-refactor-test/
> > > >> > > > > > > > > [2]
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://drive.google.com/file/d/1Gqn-ixE05kXv2jdssAUlnMuUVcHxIYZ0/view?usp=sharing
> > > >> > > > > > > > > [3]
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://drive.google.com/file/d/10gVNyb4JYg_bA18bNhY9dEDbPn3TOxL7/view?usp=sharing
> > > >> > > > > > > > > [4]
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://drive.google.com/file/d/1jVrtsO49gCvUlnaC9AUAtkVKTm4PbdUv/view?usp=sharing
> > > >> > > > > > > > >
> > > >> > > > > > > > > ________________________________________
> > > >> > > > > > > > > From: Francisco Javier Tirado Sarti <
> > > [email protected]>
> > > >> > > > > > > > > Sent: Wednesday, January 17, 2024 9:13 AM
> > > >> > > > > > > > > To: [email protected]
> > > >> > > > > > > > > Cc: Pere Fernandez Perez
> > > >> > > > > > > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance
> > issues
> > > >> with
> > > >> > > > > > data-index
> > > >> > > > > > > > > persistence addon
> > > >> > > > > > > > >
> > > >> > > > > > > > > Hi Alex,
> > > >> > > > > > > > > I did not take times (which depends on a number of
> > > >> variables
> > > >> > > that
> > > >> > > > > > > > > drastically change between environments), but verify
> > > that
> > > >> the
> > > >> > > > > number
> > > >> > > > > > of
> > > >> > > > > > > > > updates has been reduced drastically without losing
> > > >> > > > functionality,
> > > >> > > > > > > which
> > > >> > > > > > > > is
> > > >> > > > > > > > > objectively a good thing. If before the change, for
> > > every
> > > >> > node
> > > >> > > > > > > executed,
> > > >> > > > > > > > we
> > > >> > > > > > > > > have an update for every node previously executed,
> so
> > > if a
> > > >> > > > process
> > > >> > > > > > have
> > > >> > > > > > > > 50
> > > >> > > > > > > > > nodes to execute, we were performing nearly 50*51/2
> > > >> updates,
> > > >> > > > which
> > > >> > > > > > > gives
> > > >> > > > > > > > us
> > > >> > > > > > > > > a total of  1275 updates, now we have just one for
> > every
> > > >> node
> > > >> > > > being
> > > >> > > > > > > > > executed, implying a total of 50 updates.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Wed, Jan 17, 2024 at 3:18 PM Alex Porcelli <
> > > >> > > [email protected]>
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Francisco,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I noticed that your PR has been merged, but I was
> > > >> expecting
> > > >> > > (at
> > > >> > > > > > least
> > > >> > > > > > > > > > was my understanding from this thread) that before
> > > >> merging
> > > >> > > some
> > > >> > > > > > > > > > benchmark data would be shared in advance - to
> > assess
> > > >> the
> > > >> > > > > > > cost/benefit
> > > >> > > > > > > > > > of such a decent size change.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Do you have any information to share?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Sat, Dec 23, 2023 at 4:02 AM Francisco Javier
> > > Tirado
> > > >> > Sarti
> > > >> > > > > > > > > > <[email protected]> wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Yes, as intended, now we have one select and one
> > > >> > > > insert/update
> > > >> > > > > > per
> > > >> > > > > > > > node
> > > >> > > > > > > > > > > event.
> > > >> > > > > > > > > > > I moved the PR as ready for review and give
> @Pere
> > > >> > Fernandez
> > > >> > > > > Perez
> > > >> > > > > > > > > > > <[email protected]> permission to the branch
> so
> > > he
> > > >> can
> > > >> > > > edit
> > > >> > > > > it
> > > >> > > > > > > in
> > > >> > > > > > > > > the
> > > >> > > > > > > > > > > next two weeks (Ill be on PTO)  if desired,
> before
> > > >> > merging.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Thu, Dec 21, 2023 at 5:58 PM Alex Porcelli <
> > > >> > > > > [email protected]>
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Cool, thank you Francisco!
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Did you manage to get some preliminary data
> > about
> > > >> > > > > improvements?
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > On Thu, Dec 21, 2023 at 11:52 AM Francisco
> > Javier
> > > >> > Tirado
> > > >> > > > > Sarti
> > > >> > > > > > > > > > > > <[email protected]> wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Yes, after some delay because of quarkus 3
> > > >> migration.
> > > >> > > Im
> > > >> > > > > > > refining
> > > >> > > > > > > > > > this
> > > >> > > > > > > > > > > > > draft PR
> > > >> > > > > > > > > > > > >
> > > >> > > > > >
> > https://github.com/apache/incubator-kie-kogito-apps/pull/1941
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > On Thu, Dec 21, 2023 at 5:48 PM Alex
> Porcelli
> > <
> > > >> > > > > > > [email protected]>
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Any update or new findings on this topic?
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > On Tue, Nov 28, 2023 at 8:38 AM Francisco
> > > Javier
> > > >> > > Tirado
> > > >> > > > > > Sarti
> > > >> > > > > > > > > > > > > > <[email protected]> wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Hi Alex,
> > > >> > > > > > > > > > > > > > > After considering different options to
> > > improve
> > > >> > > > > > performance,
> > > >> > > > > > > > we
> > > >> > > > > > > > > > feel
> > > >> > > > > > > > > > > > that
> > > >> > > > > > > > > > > > > > it
> > > >> > > > > > > > > > > > > > > is time to "partially" move away from
> the
> > > >> current
> > > >> > > Map
> > > >> > > > > > style
> > > >> > > > > > > > > > > > interface (
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/incubator-kie-kogito-apps/blob/main/persistence-commons/persistence-commons-api/src/main/java/org/kie/kogito/persistence/api/Storage.java
> > > >> > > > > > > > > > > > > > )
> > > >> > > > > > > > > > > > > > > which was shared with Trusty, to one
> more
> > > >> > suitable
> > > >> > > > for
> > > >> > > > > > > usage
> > > >> > > > > > > > > > with a
> > > >> > > > > > > > > > > > > > > relational DB like postgresql (but still
> > > >> > compatible
> > > >> > > > > with
> > > >> > > > > > > big
> > > >> > > > > > > > > > table
> > > >> > > > > > > > > > > > dbs).
> > > >> > > > > > > > > > > > > > > The idea will be to replace generic
> > Storage
> > > >> > > interface
> > > >> > > > > by
> > > >> > > > > > > four
> > > >> > > > > > > > > > > > specific
> > > >> > > > > > > > > > > > > > > interfaces (which will inherit from a
> > common
> > > >> one
> > > >> > > that
> > > >> > > > > > keeps
> > > >> > > > > > > > the
> > > >> > > > > > > > > > query
> > > >> > > > > > > > > > > > > > part
> > > >> > > > > > > > > > > > > > > at is it. with get and query methods),
> > that
> > > >> will
> > > >> > > > > include
> > > >> > > > > > > the
> > > >> > > > > > > > > > required
> > > >> > > > > > > > > > > > > > > modification operations for the four
> > > DataIndex
> > > >> > > > > "domains":
> > > >> > > > > > > > > > > > > > processinstance,
> > > >> > > > > > > > > > > > > > > usertask, processdefinitions and jobs.
> > Those
> > > >> > > > interfaces
> > > >> > > > > > > will
> > > >> > > > > > > > > > define
> > > >> > > > > > > > > > > > > > methods
> > > >> > > > > > > > > > > > > > > like addNode, addVariable, updateTask,
> > > >> > > > > addAttachment.....
> > > >> > > > > > > > that
> > > >> > > > > > > > > > will
> > > >> > > > > > > > > > > > allow
> > > >> > > > > > > > > > > > > > > the persistent layer implementation  to
> > just
> > > >> > update
> > > >> > > > the
> > > >> > > > > > > > needed
> > > >> > > > > > > > > > info
> > > >> > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > DB  (for example, for addNode in
> Postgres,
> > > >> just
> > > >> > > > insert
> > > >> > > > > a
> > > >> > > > > > > row
> > > >> > > > > > > > > into
> > > >> > > > > > > > > > > > nodes
> > > >> > > > > > > > > > > > > > > table, for addNode in Mongo, basically
> the
> > > >> same
> > > >> > > > atomic
> > > >> > > > > > > upsert
> > > >> > > > > > > > > > > > operation
> > > >> > > > > > > > > > > > > > > that is currently done). Therefore, we
> > > >> increase
> > > >> > > > > > performance
> > > >> > > > > > > > for
> > > >> > > > > > > > > > > > Postgres
> > > >> > > > > > > > > > > > > > > and keep the current one for Mongo. The
> > > >> current
> > > >> > DB
> > > >> > > > > > schemas
> > > >> > > > > > > > > won't
> > > >> > > > > > > > > > be
> > > >> > > > > > > > > > > > > > > touched.
> > > >> > > > > > > > > > > > > > > Since the code change is large, I do not
> > > think
> > > >> > I'll
> > > >> > > > be
> > > >> > > > > > able
> > > >> > > > > > > > to
> > > >> > > > > > > > > > have
> > > >> > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > PR
> > > >> > > > > > > > > > > > > > > ready till next week.
> > > >> > > > > > > > > > > > > > > But before starting, please let me know
> if
> > > >> that
> > > >> > > > > approach
> > > >> > > > > > is
> > > >> > > > > > > > > fine
> > > >> > > > > > > > > > for
> > > >> > > > > > > > > > > > you.
> > > >> > > > > > > > > > > > > > > Best regards.
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 6:55 PM Alex
> > > Porcelli
> > > >> <
> > > >> > > > > > > > > [email protected]>
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Thank you Francisco to getting deeper
> on
> > > >> this…
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Looking forward to see the results of
> > your
> > > >> > > > suggested
> > > >> > > > > > > > > > improvements.
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 9:40 AM
> > Francisco
> > > >> > Javier
> > > >> > > > > Tirado
> > > >> > > > > > > > > Sarti <
> > > >> > > > > > > > > > > > > > > > [email protected]> wrote:
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > I forgot to attach the queries
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 3:04 PM
> > > Francisco
> > > >> > > Javier
> > > >> > > > > > Tirado
> > > >> > > > > > > > > > Sarti <
> > > >> > > > > > > > > > > > > > > > > [email protected]> wrote:
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >> Hi,
> > > >> > > > > > > > > > > > > > > > >> A brief update on this topic.
> > > >> > > > > > > > > > > > > > > > >> After doing a simple test with
> > example
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/incubator-kie-kogito-examples/tree/stable/serverless-workflow-examples/serverless-workflow-data-index-quarkus
> > > >> > > > > > > > > > > > > > > > ,
> > > >> > > > > > > > > > > > > > > > >> the number of updates over Nodes
> > table
> > > is
> > > >> > n*n,
> > > >> > > > so
> > > >> > > > > we
> > > >> > > > > > > > > manage
> > > >> > > > > > > > > > to
> > > >> > > > > > > > > > > > > > obtain a
> > > >> > > > > > > > > > > > > > > > >> perfect quadratic performance
> > > >> degradation.
> > > >> > The
> > > >> > > > > > problem
> > > >> > > > > > > > is
> > > >> > > > > > > > > > worse
> > > >> > > > > > > > > > > > in
> > > >> > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > case
> > > >> > > > > > > > > > > > > > > > >> of Serverless Workflow than in BPMN
> > > >> because
> > > >> > we
> > > >> > > > the
> > > >> > > > > > > > number
> > > >> > > > > > > > > of
> > > >> > > > > > > > > > > > nodes
> > > >> > > > > > > > > > > > > > is
> > > >> > > > > > > > > > > > > > > > >> greater than the number of states.
> In
> > > >> that
> > > >> > > > > example N
> > > >> > > > > > > is
> > > >> > > > > > > > > 16,
> > > >> > > > > > > > > > but
> > > >> > > > > > > > > > > > for
> > > >> > > > > > > > > > > > > > a
> > > >> > > > > > > > > > > > > > > > more
> > > >> > > > > > > > > > > > > > > > >> complex workflow it would be
> > certainly
> > > >> > large.
> > > >> > > > > > > > > > > > > > > > >> I think that this is more related
> to
> > > how
> > > >> we
> > > >> > > are
> > > >> > > > > > > handling
> > > >> > > > > > > > > > JPA in
> > > >> > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > code,
> > > >> > > > > > > > > > > > > > > > >> in particular the mapping from
> model
> > to
> > > >> > entity
> > > >> > > > > > > > (basically
> > > >> > > > > > > > > > JPA is
> > > >> > > > > > > > > > > > > > blind
> > > >> > > > > > > > > > > > > > > > and
> > > >> > > > > > > > > > > > > > > > >> has to update all nodes for every
> > write
> > > >> > > because
> > > >> > > > it
> > > >> > > > > > > > > believes
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > > > > > node has
> > > >> > > > > > > > > > > > > > > > >> been updated, although it is not)
> > than
> > > an
> > > >> > > issue
> > > >> > > > in
> > > >> > > > > > the
> > > >> > > > > > > > > table
> > > >> > > > > > > > > > > > > > definition.
> > > >> > > > > > > > > > > > > > > > >> In fact, when using JPA, separating
> > the
> > > >> > server
> > > >> > > > > model
> > > >> > > > > > > > from
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > > > JPA
> > > >> > > > > > > > > > > > > > > > entity is
> > > >> > > > > > > > > > > > > > > > >> not a good idea, especially if the
> > > entity
> > > >> > > > contains
> > > >> > > > > > > > > > collections.
> > > >> > > > > > > > > > > > I
> > > >> > > > > > > > > > > > > > will
> > > >> > > > > > > > > > > > > > > > try
> > > >> > > > > > > > > > > > > > > > >> to change that without breaking
> > > anything.
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> On Wed, Nov 22, 2023 at 12:10 PM
> > > Enrique
> > > >> > > > Gonzalez
> > > >> > > > > > > > > Martinez <
> > > >> > > > > > > > > > > > > > > > >> [email protected]> wrote:
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >>> After the events split you now
> will
> > > >> need to
> > > >> > > > > create
> > > >> > > > > > a
> > > >> > > > > > > > node
> > > >> > > > > > > > > > > > instance
> > > >> > > > > > > > > > > > > > > > >>> model instance of making
> independent
> > > >> from
> > > >> > the
> > > >> > > > > > process
> > > >> > > > > > > > > > instance.
> > > >> > > > > > > > > > > > > > > > >>> That should do the trick.
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > > >>> Regarding deleting/inserting it
> was
> > > >> fixed
> > > >> > at
> > > >> > > > some
> > > >> > > > > > > > point.
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > > >>> El mar, 21 nov 2023 a las 20:22,
> > > >> Francisco
> > > >> > > > Javier
> > > >> > > > > > > > Tirado
> > > >> > > > > > > > > > Sarti
> > > >> > > > > > > > > > > > > > > > >>> (<[email protected]>) escribió:
> > > >> > > > > > > > > > > > > > > > >>> >
> > > >> > > > > > > > > > > > > > > > >>> > Hi Martin,
> > > >> > > > > > > > > > > > > > > > >>> > I have a task to review
> > performance
> > > of
> > > >> > > > > > > > > > > > > > > > >>> >
> > > >> > > > > > > > > > > > > > > > >>> >
> ProcessInstanceNodeDataEventMerger
> > > >> > > > > > > > > > > > > > > > >>> > My idea is to reduce the number
> of
> > > >> delete
> > > >> > > > > inserts
> > > >> > > > > > > > when
> > > >> > > > > > > > > > > > processing
> > > >> > > > > > > > > > > > > > > > >>> events
> > > >> > > > > > > > > > > > > > > > >>> > and try to do it incremental.
> > > >> > > > > > > > > > > > > > > > >>> > That should improve performance.
> > > >> > > > > > > > > > > > > > > > >>> > PS:
> > > >> > > > > > > > > > > > > > > > >>> > I was planning to send an e-mail
> > > >> tomorrow
> > > >> > > > > > > announcing
> > > >> > > > > > > > > > that in
> > > >> > > > > > > > > > > > > > case you
> > > >> > > > > > > > > > > > > > > > >>> were
> > > >> > > > > > > > > > > > > > > > >>> > already working on a fix for
> > that. I
> > > >> > assume
> > > >> > > > you
> > > >> > > > > > are
> > > >> > > > > > > > not
> > > >> > > > > > > > > > and I
> > > >> > > > > > > > > > > > > > would
> > > >> > > > > > > > > > > > > > > > be
> > > >> > > > > > > > > > > > > > > > >>> > sending a PR soon.
> > > >> > > > > > > > > > > > > > > > >>> >
> > > >> > > > > > > > > > > > > > > > >>> > On Tue, Nov 21, 2023 at 6:09 PM
> > > Martin
> > > >> > > Weiler
> > > >> > > > > > > > > > > > > > > > <[email protected]
> > > >> > > > > > > > > > > > > > > > >>> >
> > > >> > > > > > > > > > > > > > > > >>> > wrote:
> > > >> > > > > > > > > > > > > > > > >>> >
> > > >> > > > > > > > > > > > > > > > >>> > > I looked into the new examples
> > > using
> > > >> > > > > data-index
> > > >> > > > > > > > > > persistence
> > > >> > > > > > > > > > > > > > addon -
> > > >> > > > > > > > > > > > > > > > >>> Neus'
> > > >> > > > > > > > > > > > > > > > >>> > > PR#1813 [1] for serverless and
> > > >> Pere's
> > > >> > > > branch
> > > >> > > > > > [2]
> > > >> > > > > > > > for
> > > >> > > > > > > > > > > > workflow
> > > >> > > > > > > > > > > > > > > > (great
> > > >> > > > > > > > > > > > > > > > >>> job
> > > >> > > > > > > > > > > > > > > > >>> > > both!) - and they work without
> > > >> issues
> > > >> > > using
> > > >> > > > > > > single
> > > >> > > > > > > > > > > > requests.
> > > >> > > > > > > > > > > > > > > > >>> However, under
> > > >> > > > > > > > > > > > > > > > >>> > > some load (I used 'ab' for
> > testing
> > > >> > with a
> > > >> > > > > light
> > > >> > > > > > > > > > > > concurrency of
> > > >> > > > > > > > > > > > > > 10
> > > >> > > > > > > > > > > > > > > > >>> parallel
> > > >> > > > > > > > > > > > > > > > >>> > > requests) I ran into the
> > following
> > > >> > > > problems:
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > (1) Large number of
> > insert/delete
> > > >> calls
> > > >> > > > (eg.
> > > >> > > > > > for
> > > >> > > > > > > > > tables
> > > >> > > > > > > > > > > > such as
> > > >> > > > > > > > > > > > > > > > >>> nodes,
> > > >> > > > > > > > > > > > > > > > >>> > > definitions, etc)
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > (2) Hibernate
> > > >> OptimisticLockExceptions
> > > >> > /
> > > >> > > > > > > > > > > > StaleStateExceptions
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > (3) DB deadlocks
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > (4) Error responses, slow
> > response
> > > >> > times
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > The reason I am reaching out
> > with
> > > >> this
> > > >> > > > topic
> > > >> > > > > > here
> > > >> > > > > > > > is
> > > >> > > > > > > > > to
> > > >> > > > > > > > > > > > find
> > > >> > > > > > > > > > > > > > out if
> > > >> > > > > > > > > > > > > > > > >>> we are
> > > >> > > > > > > > > > > > > > > > >>> > > aware of this issue, and if
> > > someone
> > > >> is
> > > >> > > > > already
> > > >> > > > > > > > > looking
> > > >> > > > > > > > > > > > into or
> > > >> > > > > > > > > > > > > > > > being
> > > >> > > > > > > > > > > > > > > > >>> > > assigned to it?
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > Thanks,
> > > >> > > > > > > > > > > > > > > > >>> > > Martin
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > > [1]
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > >
> > > >> > > > > > >
> > > >> > https://github.com/apache/incubator-kie-kogito-examples/pull/1813
> > > >> > > > > > > > > > > > > > > > >>> > > [2]
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/pefernan/kogito-examples/tree/example_data-index_persistence
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > >>
> ---------------------------------------------------------------------
> > > >> > > > > > > > > > > > > > > > >>> > > To unsubscribe, e-mail:
> > > >> > > > > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > > >>> > > For additional commands,
> e-mail:
> > > >> > > > > > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>> > >
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > > > > > > > > > > > > > > > >>> To unsubscribe, e-mail:
> > > >> > > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > > >>> For additional commands, e-mail:
> > > >> > > > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > > >>>
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > >>
> ---------------------------------------------------------------------
> > > >> > > > > > > > > > > > > > > > > To unsubscribe, e-mail:
> > > >> > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > > > For additional commands, e-mail:
> > > >> > > > > > > [email protected]
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > > > > > > > > > > > > > To unsubscribe, e-mail:
> > > >> > > [email protected]
> > > >> > > > > > > > > > > > > > For additional commands, e-mail:
> > > >> > > > [email protected]
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > >>
> ---------------------------------------------------------------------
> > > >> > > > > > > > > > > > To unsubscribe, e-mail:
> > > >> [email protected]
> > > >> > > > > > > > > > > > For additional commands, e-mail:
> > > >> > [email protected]
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > > > > > > > > > To unsubscribe, e-mail:
> > > [email protected]
> > > >> > > > > > > > > > For additional commands, e-mail:
> > > >> [email protected]
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > >
> > > >>
> ---------------------------------------------------------------------
> > > >> > > > > > > > > To unsubscribe, e-mail:
> > [email protected]
> > > >> > > > > > > > > For additional commands, e-mail:
> > > [email protected]
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: [email protected]
> > > >> For additional commands, e-mail: [email protected]
> > > >>
> > > >>
> > >
> >
>

Re: [DISCUSSION] Performance issues with data-index persistence addon

Reply via email to