I always preferred pure JDBC over Hibernate myself, just for the sake of control of what is happening :) So I would not -1 that myself.
Tibor Dňa pi 9. 2. 2024, 17:00 Francisco Javier Tirado Sarti <ftira...@redhat.com> napísal(a): > Hi, > Usually I do not want to talk about work in progress because preliminary > conclusions are pretty volatile but, well, there are a couple of things > that can be concluded from the really valuable information that Martin > provided. > 1) In order to be able to determine if the number of statements is larger > than expected, I asked Martin to test with a simpler process definition. > One with just three nodes: start, script and end. The script one should > change just one variable. This way we can analyze if the number of queries > is the expected one. From the single log (audit was activated them) my > conclusion is that the number of insert/updates over processes and nodes > (there a lot over task, that I will prefer to skip for now, baby steps) is > the expected one. > 2) Analysing the thread dump, we see around 15 threads executing this line > at > > org.kie.kogito.index.jpa.storage.ProcessInstanceEntityStorage.indexNode(ProcessInstanceEntityStorage.java:125), > so its pretty clear the code to be optimized ;). I'm evaluating > possibilities within JPA/Hibernate, but I'm starting to think that it might > be better to switch to JDBC and skip hibernate. Our lives will be simpler, > especially with a schema relatively simple like ours (that will be my > recommendation if I was an external consultant) > > On Fri, Feb 9, 2024 at 4:15 PM Tibor Zimányi <tzima...@apache.org> wrote: > > > Hi, > > > > this will be a bit off-topic. However as far as performance, I think we > > should think about that we have string primary keys (IDs). I would expect > > the database systems are much better with indexing numeric keys than > > strings. I remember from the past, when I was working with DBs, that > using > > strings as keys or indexes was a discouraged practice. > > > > Best regards, > > Tibor > > > > Dňa št 8. 2. 2024, 22:45 Martin Weiler <mwei...@ibm.com.invalid> > > napísal(a): > > > > > I changed the test to use MongoDB [1] and I don't see a performance > > > degradation with this setup [2]. > > > > > > Please keep us posted of your findings. Thanks! > > > > > > Martin > > > > > > [1] > > https://github.com/martinweiler/job-service-refactor-test/tree/mongodb > > > [2] > > > > > > https://drive.google.com/file/d/1NfacXaxJlgRMw4OQ5S20cvkzvaUKUVFj/view?usp=sharing > > > > > > ________________________________________ > > > From: Francisco Javier Tirado Sarti <ftira...@redhat.com> > > > Sent: Wednesday, February 7, 2024 11:40 AM > > > To: dev@kie.apache.org > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with data-index > > > persistence addon > > > > > > yes, it can be index degradation because of size, but I believe (I > might > > be > > > wrong) the db is too small (yet) for that. > > > But, eventually, Postgres, when the DB is huge enough, unavoidably will > > > behave like the graphic that Martin sent. > > > Since I believe we are not huge enough (yet), lets rule out another > issue > > > by analysing the sql logs (I requested those to Martin offline and he > is > > > going to kindly collect them). > > > Also Im curious to know if Mongo behave in the same way. > > > > > > On Wed, Feb 7, 2024 at 7:25 PM Enrique Gonzalez Martinez < > > > egonza...@apache.org> wrote: > > > > > > > Hi Francisco, > > > > I would highly recommend to check indexes and how the updates work in > > > data > > > > index to avoid full scan table and lock the full table. Some db are > > very > > > > sensitive to that. > > > > > > > > El mié, 7 feb 2024, 18:41, Francisco Javier Tirado Sarti < > > > > ftira...@redhat.com> escribió: > > > > > > > > > Hi Martin, > > > > > While I analyze the data, let me ask you if it is possible to > perform > > > > > another check (similar in a way to disabling data-index like you > do) > > > Can > > > > > you switch to MongoDB persistence and check if the same degradation > > > that > > > > is > > > > > there for postgres remains? > > > > > I do not know if this is feasible but will certainly indicate the > > > problem > > > > > is on the postgres storage layer and I do not have a clear > prediction > > > of > > > > > what we will see when doing this switch. > > > > > > > > > > On Wed, Feb 7, 2024 at 6:37 PM Martin Weiler > <mwei...@ibm.com.invalid > > > > > > > > wrote: > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > thanks for your work on this important topic! > > > > > > > > > > > > I would like to share some test results here, which might help to > > > > improve > > > > > > the codebase even further. I am using the jmeter based test case > > from > > > > > Pere > > > > > > and Enrique (thanks guys!) [1] which uses a load of 30 threads to > > > > > > > > > > > > 1) start a new process instance (POST) > > > > > > 2) retrieve tasks for a user (GET) > > > > > > 3) fetches task details (GET) > > > > > > 4) complete a task (POST) > > > > > > 5) execute a query on data-audit > > > > > > > > > > > > With this test setup, I noticed that the performance for the POST > > > > > > requests, in particular the one to start a new process instance, > > > > degrades > > > > > > over time - see graph [2]. If I run the same test without > > data-index, > > > > > then > > > > > > there is no such performance degradation [3]. You can find a > thread > > > > dump > > > > > > captured a few minutes into the first test here [4] that might > help > > > to > > > > > see > > > > > > some of the contention points. > > > > > > > > > > > > I'd appreciate if you could take a look and see if there is > > something > > > > > that > > > > > > can be further improved based on your previous work. If you need > > any > > > > > > additional data, let me know, but otherwise it is straightforward > > to > > > > run > > > > > > the jmeter test as well. > > > > > > > > > > > > Thanks, > > > > > > Martin > > > > > > > > > > > > [1] https://github.com/pefernan/job-service-refactor-test/ > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/1Gqn-ixE05kXv2jdssAUlnMuUVcHxIYZ0/view?usp=sharing > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/10gVNyb4JYg_bA18bNhY9dEDbPn3TOxL7/view?usp=sharing > > > > > > [4] > > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/1jVrtsO49gCvUlnaC9AUAtkVKTm4PbdUv/view?usp=sharing > > > > > > > > > > > > ________________________________________ > > > > > > From: Francisco Javier Tirado Sarti <ftira...@redhat.com> > > > > > > Sent: Wednesday, January 17, 2024 9:13 AM > > > > > > To: dev@kie.apache.org > > > > > > Cc: Pere Fernandez Perez > > > > > > Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with > > > data-index > > > > > > persistence addon > > > > > > > > > > > > Hi Alex, > > > > > > I did not take times (which depends on a number of variables that > > > > > > drastically change between environments), but verify that the > > number > > > of > > > > > > updates has been reduced drastically without losing > functionality, > > > > which > > > > > is > > > > > > objectively a good thing. If before the change, for every node > > > > executed, > > > > > we > > > > > > have an update for every node previously executed, so if a > process > > > have > > > > > 50 > > > > > > nodes to execute, we were performing nearly 50*51/2 updates, > which > > > > gives > > > > > us > > > > > > a total of 1275 updates, now we have just one for every node > being > > > > > > executed, implying a total of 50 updates. > > > > > > > > > > > > > > > > > > On Wed, Jan 17, 2024 at 3:18 PM Alex Porcelli <a...@porcelli.me> > > > > wrote: > > > > > > > > > > > > > Francisco, > > > > > > > > > > > > > > I noticed that your PR has been merged, but I was expecting (at > > > least > > > > > > > was my understanding from this thread) that before merging some > > > > > > > benchmark data would be shared in advance - to assess the > > > > cost/benefit > > > > > > > of such a decent size change. > > > > > > > > > > > > > > Do you have any information to share? > > > > > > > > > > > > > > On Sat, Dec 23, 2023 at 4:02 AM Francisco Javier Tirado Sarti > > > > > > > <ftira...@redhat.com> wrote: > > > > > > > > > > > > > > > > Yes, as intended, now we have one select and one > insert/update > > > per > > > > > node > > > > > > > > event. > > > > > > > > I moved the PR as ready for review and give @Pere Fernandez > > Perez > > > > > > > > <pefer...@redhat.com> permission to the branch so he can > edit > > it > > > > in > > > > > > the > > > > > > > > next two weeks (Ill be on PTO) if desired, before merging. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 5:58 PM Alex Porcelli < > > a...@porcelli.me> > > > > > > wrote: > > > > > > > > > > > > > > > > > Cool, thank you Francisco! > > > > > > > > > > > > > > > > > > Did you manage to get some preliminary data about > > improvements? > > > > > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 11:52 AM Francisco Javier Tirado > > Sarti > > > > > > > > > <ftira...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > Yes, after some delay because of quarkus 3 migration. Im > > > > refining > > > > > > > this > > > > > > > > > > draft PR > > > > > > > > > > > > > https://github.com/apache/incubator-kie-kogito-apps/pull/1941 > > > > > > > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 5:48 PM Alex Porcelli < > > > > a...@porcelli.me> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Any update or new findings on this topic? > > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 28, 2023 at 8:38 AM Francisco Javier Tirado > > > Sarti > > > > > > > > > > > <ftira...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > After considering different options to improve > > > performance, > > > > > we > > > > > > > feel > > > > > > > > > that > > > > > > > > > > > it > > > > > > > > > > > > is time to "partially" move away from the current Map > > > style > > > > > > > > > interface ( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-kie-kogito-apps/blob/main/persistence-commons/persistence-commons-api/src/main/java/org/kie/kogito/persistence/api/Storage.java > > > > > > > > > > > ) > > > > > > > > > > > > which was shared with Trusty, to one more suitable > for > > > > usage > > > > > > > with a > > > > > > > > > > > > relational DB like postgresql (but still compatible > > with > > > > big > > > > > > > table > > > > > > > > > dbs). > > > > > > > > > > > > The idea will be to replace generic Storage interface > > by > > > > four > > > > > > > > > specific > > > > > > > > > > > > interfaces (which will inherit from a common one that > > > keeps > > > > > the > > > > > > > query > > > > > > > > > > > part > > > > > > > > > > > > at is it. with get and query methods), that will > > include > > > > the > > > > > > > required > > > > > > > > > > > > modification operations for the four DataIndex > > "domains": > > > > > > > > > > > processinstance, > > > > > > > > > > > > usertask, processdefinitions and jobs. Those > interfaces > > > > will > > > > > > > define > > > > > > > > > > > methods > > > > > > > > > > > > like addNode, addVariable, updateTask, > > addAttachment..... > > > > > that > > > > > > > will > > > > > > > > > allow > > > > > > > > > > > > the persistent layer implementation to just update > the > > > > > needed > > > > > > > info > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > DB (for example, for addNode in Postgres, just > insert > > a > > > > row > > > > > > into > > > > > > > > > nodes > > > > > > > > > > > > table, for addNode in Mongo, basically the same > atomic > > > > upsert > > > > > > > > > operation > > > > > > > > > > > > that is currently done). Therefore, we increase > > > performance > > > > > for > > > > > > > > > Postgres > > > > > > > > > > > > and keep the current one for Mongo. The current DB > > > schemas > > > > > > won't > > > > > > > be > > > > > > > > > > > > touched. > > > > > > > > > > > > Since the code change is large, I do not think I'll > be > > > able > > > > > to > > > > > > > have > > > > > > > > > the > > > > > > > > > > > PR > > > > > > > > > > > > ready till next week. > > > > > > > > > > > > But before starting, please let me know if that > > approach > > > is > > > > > > fine > > > > > > > for > > > > > > > > > you. > > > > > > > > > > > > Best regards. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 6:55 PM Alex Porcelli < > > > > > > a...@porcelli.me> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Thank you Francisco to getting deeper on this… > > > > > > > > > > > > > > > > > > > > > > > > > > Looking forward to see the results of your > suggested > > > > > > > improvements. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 9:40 AM Francisco Javier > > Tirado > > > > > > Sarti < > > > > > > > > > > > > > ftira...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I forgot to attach the queries > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 24, 2023 at 3:04 PM Francisco Javier > > > Tirado > > > > > > > Sarti < > > > > > > > > > > > > > > ftira...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi, > > > > > > > > > > > > > >> A brief update on this topic. > > > > > > > > > > > > > >> After doing a simple test with example > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-kie-kogito-examples/tree/stable/serverless-workflow-examples/serverless-workflow-data-index-quarkus > > > > > > > > > > > > > , > > > > > > > > > > > > > >> the number of updates over Nodes table is n*n, > so > > we > > > > > > manage > > > > > > > to > > > > > > > > > > > obtain a > > > > > > > > > > > > > >> perfect quadratic performance degradation. The > > > problem > > > > > is > > > > > > > worse > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > case > > > > > > > > > > > > > >> of Serverless Workflow than in BPMN because we > the > > > > > number > > > > > > of > > > > > > > > > nodes > > > > > > > > > > > is > > > > > > > > > > > > > >> greater than the number of states. In that > > example N > > > > is > > > > > > 16, > > > > > > > but > > > > > > > > > for > > > > > > > > > > > a > > > > > > > > > > > > > more > > > > > > > > > > > > > >> complex workflow it would be certainly large. > > > > > > > > > > > > > >> I think that this is more related to how we are > > > > handling > > > > > > > JPA in > > > > > > > > > the > > > > > > > > > > > > > code, > > > > > > > > > > > > > >> in particular the mapping from model to entity > > > > > (basically > > > > > > > JPA is > > > > > > > > > > > blind > > > > > > > > > > > > > and > > > > > > > > > > > > > >> has to update all nodes for every write because > it > > > > > > believes > > > > > > > the > > > > > > > > > > > node has > > > > > > > > > > > > > >> been updated, although it is not) than an issue > in > > > the > > > > > > table > > > > > > > > > > > definition. > > > > > > > > > > > > > >> In fact, when using JPA, separating the server > > model > > > > > from > > > > > > > the > > > > > > > > > JPA > > > > > > > > > > > > > entity is > > > > > > > > > > > > > >> not a good idea, especially if the entity > contains > > > > > > > collections. > > > > > > > > > I > > > > > > > > > > > will > > > > > > > > > > > > > try > > > > > > > > > > > > > >> to change that without breaking anything. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> On Wed, Nov 22, 2023 at 12:10 PM Enrique > Gonzalez > > > > > > Martinez < > > > > > > > > > > > > > >> egonza...@apache.org> wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> After the events split you now will need to > > create > > > a > > > > > node > > > > > > > > > instance > > > > > > > > > > > > > >>> model instance of making independent from the > > > process > > > > > > > instance. > > > > > > > > > > > > > >>> That should do the trick. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Regarding deleting/inserting it was fixed at > some > > > > > point. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> El mar, 21 nov 2023 a las 20:22, Francisco > Javier > > > > > Tirado > > > > > > > Sarti > > > > > > > > > > > > > >>> (<ftira...@redhat.com>) escribió: > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> > Hi Martin, > > > > > > > > > > > > > >>> > I have a task to review performance of > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> > ProcessInstanceNodeDataEventMerger > > > > > > > > > > > > > >>> > My idea is to reduce the number of delete > > inserts > > > > > when > > > > > > > > > processing > > > > > > > > > > > > > >>> events > > > > > > > > > > > > > >>> > and try to do it incremental. > > > > > > > > > > > > > >>> > That should improve performance. > > > > > > > > > > > > > >>> > PS: > > > > > > > > > > > > > >>> > I was planning to send an e-mail tomorrow > > > > announcing > > > > > > > that in > > > > > > > > > > > case you > > > > > > > > > > > > > >>> were > > > > > > > > > > > > > >>> > already working on a fix for that. I assume > you > > > are > > > > > not > > > > > > > and I > > > > > > > > > > > would > > > > > > > > > > > > > be > > > > > > > > > > > > > >>> > sending a PR soon. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> > On Tue, Nov 21, 2023 at 6:09 PM Martin Weiler > > > > > > > > > > > > > <mwei...@ibm.com.invalid > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> > wrote: > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> > > I looked into the new examples using > > data-index > > > > > > > persistence > > > > > > > > > > > addon - > > > > > > > > > > > > > >>> Neus' > > > > > > > > > > > > > >>> > > PR#1813 [1] for serverless and Pere's > branch > > > [2] > > > > > for > > > > > > > > > workflow > > > > > > > > > > > > > (great > > > > > > > > > > > > > >>> job > > > > > > > > > > > > > >>> > > both!) - and they work without issues using > > > > single > > > > > > > > > requests. > > > > > > > > > > > > > >>> However, under > > > > > > > > > > > > > >>> > > some load (I used 'ab' for testing with a > > light > > > > > > > > > concurrency of > > > > > > > > > > > 10 > > > > > > > > > > > > > >>> parallel > > > > > > > > > > > > > >>> > > requests) I ran into the following > problems: > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > (1) Large number of insert/delete calls > (eg. > > > for > > > > > > tables > > > > > > > > > such as > > > > > > > > > > > > > >>> nodes, > > > > > > > > > > > > > >>> > > definitions, etc) > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > (2) Hibernate OptimisticLockExceptions / > > > > > > > > > StaleStateExceptions > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > (3) DB deadlocks > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > (4) Error responses, slow response times > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > The reason I am reaching out with this > topic > > > here > > > > > is > > > > > > to > > > > > > > > > find > > > > > > > > > > > out if > > > > > > > > > > > > > >>> we are > > > > > > > > > > > > > >>> > > aware of this issue, and if someone is > > already > > > > > > looking > > > > > > > > > into or > > > > > > > > > > > > > being > > > > > > > > > > > > > >>> > > assigned to it? > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > Thanks, > > > > > > > > > > > > > >>> > > Martin > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > [1] > > > > > > > > > > > > > >>> > > > > > > > > > > > > > https://github.com/apache/incubator-kie-kogito-examples/pull/1813 > > > > > > > > > > > > > >>> > > [2] > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/pefernan/kogito-examples/tree/example_data-index_persistence > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > > > >>> > > To unsubscribe, e-mail: > > > > > > dev-unsubscr...@kie.apache.org > > > > > > > > > > > > > >>> > > For additional commands, e-mail: > > > > > > > dev-h...@kie.apache.org > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > > > >>> To unsubscribe, e-mail: > > > > dev-unsubscr...@kie.apache.org > > > > > > > > > > > > > >>> For additional commands, e-mail: > > > > > dev-h...@kie.apache.org > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > > > > To unsubscribe, e-mail: > > > dev-unsubscr...@kie.apache.org > > > > > > > > > > > > > > For additional commands, e-mail: > > > > dev-h...@kie.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org > > > > > > > > > > > For additional commands, e-mail: > dev-h...@kie.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org > > > > > > > > > For additional commands, e-mail: dev-h...@kie.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org > > > > > > > For additional commands, e-mail: dev-h...@kie.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: dev-unsubscr...@kie.apache.org > > > > > > For additional commands, e-mail: dev-h...@kie.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > >