Hi,

this will be a bit off-topic. However as far as performance, I think we
should think about that we have string primary keys (IDs). I would expect
the database systems are much better with indexing numeric keys than
strings. I remember from the past, when I was working with DBs, that using
strings as keys or indexes was a discouraged practice.

Best regards,
Tibor

Dňa št 8. 2. 2024, 22:45 Martin Weiler <mwei...@ibm.com.invalid> napísal(a):

> I changed the test to use MongoDB [1] and I don't see a performance
> degradation with this setup [2].
>
> Please keep us posted of your findings. Thanks!
>
> Martin
>
> [1] https://github.com/martinweiler/job-service-refactor-test/tree/mongodb
> [2]
> https://drive.google.com/file/d/1NfacXaxJlgRMw4OQ5S20cvkzvaUKUVFj/view?usp=sharing
>
> ________________________________________
> From: Francisco Javier Tirado Sarti <ftira...@redhat.com>
> Sent: Wednesday, February 7, 2024 11:40 AM
> To: dev@kie.apache.org
> Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with data-index
> persistence addon
>
> yes, it can be index degradation because of size, but I believe (I might be
> wrong) the db is too small (yet) for that.
> But, eventually, Postgres, when the DB is huge enough, unavoidably will
> behave like the graphic that Martin sent.
> Since I believe we are not huge enough (yet), lets rule out another issue
> by analysing the sql logs (I requested those to Martin offline and he is
> going to kindly collect them).
> Also Im curious to know if Mongo behave in the same way.
>
> On Wed, Feb 7, 2024 at 7:25 PM Enrique Gonzalez Martinez <
> egonza...@apache.org> wrote:
>
> > Hi Francisco,
> > I would highly recommend to check indexes and how the updates work in
> data
> > index to avoid full scan table and lock the full table. Some db are very
> > sensitive to that.
> >
> > El mié, 7 feb 2024, 18:41, Francisco Javier Tirado Sarti <
> > ftira...@redhat.com> escribió:
> >
> > > Hi Martin,
> > > While I analyze the data, let me ask you if it is possible to perform
> > > another check (similar in a way to disabling data-index like you do)
> Can
> > > you switch to MongoDB persistence and check if the same degradation
> that
> > is
> > > there for postgres remains?
> > > I do not know if this is feasible but will certainly indicate the
> problem
> > > is on the postgres storage layer and I do not have a clear prediction
> of
> > > what we will see when doing this switch.
> > >
> > > On Wed, Feb 7, 2024 at 6:37 PM Martin Weiler <mwei...@ibm.com.invalid>
> > > wrote:
> > >
> > > > Hi Francisco,
> > > >
> > > > thanks for your work on this important topic!
> > > >
> > > > I would like to share some test results here, which might help to
> > improve
> > > > the codebase even further. I am using the jmeter based test case from
> > > Pere
> > > > and Enrique (thanks guys!) [1] which uses a load of 30 threads to
> > > >
> > > > 1) start a new process instance (POST)
> > > > 2) retrieve tasks for a user (GET)
> > > > 3) fetches task details (GET)
> > > > 4) complete a task (POST)
> > > > 5) execute a query on data-audit
> > > >
> > > > With this test setup, I noticed that the performance for the POST
> > > > requests, in particular the one to start a new process instance,
> > degrades
> > > > over time - see graph [2]. If I run the same test without data-index,
> > > then
> > > > there is no such performance degradation [3]. You can find a thread
> > dump
> > > > captured a few minutes into the first test here [4] that might help
> to
> > > see
> > > > some of the contention points.
> > > >
> > > > I'd appreciate if you could take a look and see if there is something
> > > that
> > > > can be further improved based on your previous work. If you need any
> > > > additional data, let me know, but otherwise it is straightforward to
> > run
> > > > the jmeter test as well.
> > > >
> > > > Thanks,
> > > > Martin
> > > >
> > > > [1] https://github.com/pefernan/job-service-refactor-test/
> > > > [2]
> > > >
> > >
> >
> https://drive.google.com/file/d/1Gqn-ixE05kXv2jdssAUlnMuUVcHxIYZ0/view?usp=sharing
> > > > [3]
> > > >
> > >
> >
> https://drive.google.com/file/d/10gVNyb4JYg_bA18bNhY9dEDbPn3TOxL7/view?usp=sharing
> > > > [4]
> > > >
> > >
> >
> https://drive.google.com/file/d/1jVrtsO49gCvUlnaC9AUAtkVKTm4PbdUv/view?usp=sharing
> > > >
> > > > ________________________________________
> > > > From: Francisco Javier Tirado Sarti <ftira...@redhat.com>
> > > > Sent: Wednesday, January 17, 2024 9:13 AM
> > > > To: dev@kie.apache.org
> > > > Cc: Pere Fernandez Perez
> > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with
> data-index
> > > > persistence addon
> > > >
> > > > Hi Alex,
> > > > I did not take times (which depends on a number of variables that
> > > > drastically change between environments), but verify that the number
> of
> > > > updates has been reduced drastically without losing functionality,
> > which
> > > is
> > > > objectively a good thing. If before the change, for every node
> > executed,
> > > we
> > > > have an update for every node previously executed, so if a process
> have
> > > 50
> > > > nodes to execute, we were performing nearly 50*51/2 updates, which
> > gives
> > > us
> > > > a total of  1275 updates, now we have just one for every node being
> > > > executed, implying a total of 50 updates.
> > > >
> > > >
> > > > On Wed, Jan 17, 2024 at 3:18 PM Alex Porcelli <a...@porcelli.me>
> > wrote:
> > > >
> > > > > Francisco,
> > > > >
> > > > > I noticed that your PR has been merged, but I was expecting (at
> least
> > > > > was my understanding from this thread) that before merging some
> > > > > benchmark data would be shared in advance - to assess the
> > cost/benefit
> > > > > of such a decent size change.
> > > > >
> > > > > Do you have any information to share?
> > > > >
> > > > > On Sat, Dec 23, 2023 at 4:02 AM Francisco Javier Tirado Sarti
> > > > > <ftira...@redhat.com> wrote:
> > > > > >
> > > > > > Yes, as intended, now we have one select and one insert/update
> per
> > > node
> > > > > > event.
> > > > > > I moved the PR as ready for review and give @Pere Fernandez Perez
> > > > > > <pefer...@redhat.com> permission to the branch so he can edit it
> > in
> > > > the
> > > > > > next two weeks (Ill be on PTO)  if desired, before merging.
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 5:58 PM Alex Porcelli <a...@porcelli.me>
> > > > wrote:
> > > > > >
> > > > > > > Cool, thank you Francisco!
> > > > > > >
> > > > > > > Did you manage to get some preliminary data about improvements?
> > > > > > >
> > > > > > > On Thu, Dec 21, 2023 at 11:52 AM Francisco Javier Tirado Sarti
> > > > > > > <ftira...@redhat.com> wrote:
> > > > > > > >
> > > > > > > > Yes, after some delay because of quarkus 3 migration. Im
> > refining
> > > > > this
> > > > > > > > draft PR
> > > > > > > >
> https://github.com/apache/incubator-kie-kogito-apps/pull/1941
> > > > > > > >
> > > > > > > > On Thu, Dec 21, 2023 at 5:48 PM Alex Porcelli <
> > a...@porcelli.me>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Any update or new findings on this topic?
> > > > > > > > >
> > > > > > > > > On Tue, Nov 28, 2023 at 8:38 AM Francisco Javier Tirado
> Sarti
> > > > > > > > > <ftira...@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Alex,
> > > > > > > > > > After considering different options to improve
> performance,
> > > we
> > > > > feel
> > > > > > > that
> > > > > > > > > it
> > > > > > > > > > is time to "partially" move away from the current Map
> style
> > > > > > > interface (
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-kie-kogito-apps/blob/main/persistence-commons/persistence-commons-api/src/main/java/org/kie/kogito/persistence/api/Storage.java
> > > > > > > > > )
> > > > > > > > > > which was shared with Trusty, to one more suitable for
> > usage
> > > > > with a
> > > > > > > > > > relational DB like postgresql (but still compatible with
> > big
> > > > > table
> > > > > > > dbs).
> > > > > > > > > > The idea will be to replace generic Storage interface by
> > four
> > > > > > > specific
> > > > > > > > > > interfaces (which will inherit from a common one that
> keeps
> > > the
> > > > > query
> > > > > > > > > part
> > > > > > > > > > at is it. with get and query methods), that will include
> > the
> > > > > required
> > > > > > > > > > modification operations for the four DataIndex "domains":
> > > > > > > > > processinstance,
> > > > > > > > > > usertask, processdefinitions and jobs. Those interfaces
> > will
> > > > > define
> > > > > > > > > methods
> > > > > > > > > > like addNode, addVariable, updateTask, addAttachment.....
> > > that
> > > > > will
> > > > > > > allow
> > > > > > > > > > the persistent layer implementation  to just update the
> > > needed
> > > > > info
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > DB  (for example, for addNode in Postgres, just insert a
> > row
> > > > into
> > > > > > > nodes
> > > > > > > > > > table, for addNode in Mongo, basically the same atomic
> > upsert
> > > > > > > operation
> > > > > > > > > > that is currently done). Therefore, we increase
> performance
> > > for
> > > > > > > Postgres
> > > > > > > > > > and keep the current one for Mongo. The current DB
> schemas
> > > > won't
> > > > > be
> > > > > > > > > > touched.
> > > > > > > > > > Since the code change is large, I do not think I'll be
> able
> > > to
> > > > > have
> > > > > > > the
> > > > > > > > > PR
> > > > > > > > > > ready till next week.
> > > > > > > > > > But before starting, please let me know if that approach
> is
> > > > fine
> > > > > for
> > > > > > > you.
> > > > > > > > > > Best regards.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 24, 2023 at 6:55 PM Alex Porcelli <
> > > > a...@porcelli.me>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you Francisco to getting deeper on this…
> > > > > > > > > > >
> > > > > > > > > > > Looking forward to see the results of your suggested
> > > > > improvements.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 24, 2023 at 9:40 AM Francisco Javier Tirado
> > > > Sarti <
> > > > > > > > > > > ftira...@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I forgot to attach the queries
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 24, 2023 at 3:04 PM Francisco Javier
> Tirado
> > > > > Sarti <
> > > > > > > > > > > > ftira...@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hi,
> > > > > > > > > > > >> A brief update on this topic.
> > > > > > > > > > > >> After doing a simple test with example
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-kie-kogito-examples/tree/stable/serverless-workflow-examples/serverless-workflow-data-index-quarkus
> > > > > > > > > > > ,
> > > > > > > > > > > >> the number of updates over Nodes table is n*n, so we
> > > > manage
> > > > > to
> > > > > > > > > obtain a
> > > > > > > > > > > >> perfect quadratic performance degradation. The
> problem
> > > is
> > > > > worse
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > case
> > > > > > > > > > > >> of Serverless Workflow than in BPMN because we the
> > > number
> > > > of
> > > > > > > nodes
> > > > > > > > > is
> > > > > > > > > > > >> greater than the number of states. In that example N
> > is
> > > > 16,
> > > > > but
> > > > > > > for
> > > > > > > > > a
> > > > > > > > > > > more
> > > > > > > > > > > >> complex workflow it would be certainly large.
> > > > > > > > > > > >> I think that this is more related to how we are
> > handling
> > > > > JPA in
> > > > > > > the
> > > > > > > > > > > code,
> > > > > > > > > > > >> in particular the mapping from model to entity
> > > (basically
> > > > > JPA is
> > > > > > > > > blind
> > > > > > > > > > > and
> > > > > > > > > > > >> has to update all nodes for every write because it
> > > > believes
> > > > > the
> > > > > > > > > node has
> > > > > > > > > > > >> been updated, although it is not) than an issue in
> the
> > > > table
> > > > > > > > > definition.
> > > > > > > > > > > >> In fact, when using JPA, separating the server model
> > > from
> > > > > the
> > > > > > > JPA
> > > > > > > > > > > entity is
> > > > > > > > > > > >> not a good idea, especially if the entity contains
> > > > > collections.
> > > > > > > I
> > > > > > > > > will
> > > > > > > > > > > try
> > > > > > > > > > > >> to change that without breaking anything.
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Wed, Nov 22, 2023 at 12:10 PM Enrique Gonzalez
> > > > Martinez <
> > > > > > > > > > > >> egonza...@apache.org> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >>> After the events split you now will need to create
> a
> > > node
> > > > > > > instance
> > > > > > > > > > > >>> model instance of making independent from the
> process
> > > > > instance.
> > > > > > > > > > > >>> That should do the trick.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Regarding deleting/inserting it was fixed at some
> > > point.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> El mar, 21 nov 2023 a las 20:22, Francisco Javier
> > > Tirado
> > > > > Sarti
> > > > > > > > > > > >>> (<ftira...@redhat.com>) escribió:
> > > > > > > > > > > >>> >
> > > > > > > > > > > >>> > Hi Martin,
> > > > > > > > > > > >>> > I have a task to review performance of
> > > > > > > > > > > >>> >
> > > > > > > > > > > >>> > ProcessInstanceNodeDataEventMerger
> > > > > > > > > > > >>> > My idea is to reduce the number of delete inserts
> > > when
> > > > > > > processing
> > > > > > > > > > > >>> events
> > > > > > > > > > > >>> > and try to do it incremental.
> > > > > > > > > > > >>> > That should improve performance.
> > > > > > > > > > > >>> > PS:
> > > > > > > > > > > >>> > I was planning to send an e-mail tomorrow
> > announcing
> > > > > that in
> > > > > > > > > case you
> > > > > > > > > > > >>> were
> > > > > > > > > > > >>> > already working on a fix for that. I assume you
> are
> > > not
> > > > > and I
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > >>> > sending a PR soon.
> > > > > > > > > > > >>> >
> > > > > > > > > > > >>> > On Tue, Nov 21, 2023 at 6:09 PM Martin Weiler
> > > > > > > > > > > <mwei...@ibm.com.invalid
> > > > > > > > > > > >>> >
> > > > > > > > > > > >>> > wrote:
> > > > > > > > > > > >>> >
> > > > > > > > > > > >>> > > I looked into the new examples using data-index
> > > > > persistence
> > > > > > > > > addon -
> > > > > > > > > > > >>> Neus'
> > > > > > > > > > > >>> > > PR#1813 [1] for serverless and Pere's branch
> [2]
> > > for
> > > > > > > workflow
> > > > > > > > > > > (great
> > > > > > > > > > > >>> job
> > > > > > > > > > > >>> > > both!) - and they work without issues using
> > single
> > > > > > > requests.
> > > > > > > > > > > >>> However, under
> > > > > > > > > > > >>> > > some load (I used 'ab' for testing with a light
> > > > > > > concurrency of
> > > > > > > > > 10
> > > > > > > > > > > >>> parallel
> > > > > > > > > > > >>> > > requests) I ran into the following problems:
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > (1) Large number of insert/delete calls (eg.
> for
> > > > tables
> > > > > > > such as
> > > > > > > > > > > >>> nodes,
> > > > > > > > > > > >>> > > definitions, etc)
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > (2) Hibernate OptimisticLockExceptions /
> > > > > > > StaleStateExceptions
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > (3) DB deadlocks
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > (4) Error responses, slow response times
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > The reason I am reaching out with this topic
> here
> > > is
> > > > to
> > > > > > > find
> > > > > > > > > out if
> > > > > > > > > > > >>> we are
> > > > > > > > > > > >>> > > aware of this issue, and if someone is already
> > > > looking
> > > > > > > into or
> > > > > > > > > > > being
> > > > > > > > > > > >>> > > assigned to it?
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > Thanks,
> > > > > > > > > > > >>> > > Martin
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > > [1]
> > > > > > > > > > > >>>
> > > > > > >
> > https://github.com/apache/incubator-kie-kogito-examples/pull/1813
> > > > > > > > > > > >>> > > [2]
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>>
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/pefernan/kogito-examples/tree/example_data-index_persistence
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > >
> > > > > > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > > >>> > > To unsubscribe, e-mail:
> > > > dev-unsubscr...@kie.apache.org
> > > > > > > > > > > >>> > > For additional commands, e-mail:
> > > > > dev-h...@kie.apache.org
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>> > >
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > > > >>> To unsubscribe, e-mail:
> > dev-unsubscr...@kie.apache.org
> > > > > > > > > > > >>> For additional commands, e-mail:
> > > dev-h...@kie.apache.org
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > > > To unsubscribe, e-mail:
> dev-unsubscr...@kie.apache.org
> > > > > > > > > > > > For additional commands, e-mail:
> > dev-h...@kie.apache.org
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> > > > > > > > > For additional commands, e-mail: dev-h...@kie.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@kie.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> > > > > For additional commands, e-mail: dev-h...@kie.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org
> > > > For additional commands, e-mail: dev-h...@kie.apache.org
> > > >
> > > >
> > >
> >
>

Reply via email to