Hi Prakhar, Glad to know that this implementation helps you. Thanks for sharing details :) -- Thanks & Regards Pawan Verma Technical Consultant *HotWax Systems* *Enterprise open source experts* http://www.hotwaxsystems.com
On Mon, Jul 20, 2020 at 1:42 PM Prakhar Kumar < prakhar.ku...@hotwaxsystems.com> wrote: > Hello Pawan, > > We were getting a hard time dealing with large datasets in our client > project. We were streaming data from MySQL using the FetchSize and > EntityListIterator, which helped us up to some point, but ultimately > struggled with the further increase in data. This is where the batch > processing implementation came to rescue. We incorporated it into our > project and were able to process the data with ease. This implementation > seems to be quite scalable and faster in performance as compared to > streaming. Batch processing was the need of the hour and there we have it > in OFBiz. Thanks, Pawan for your valuable contribution. > > On Wed, Jul 1, 2020 at 4:06 PM Pawan Verma <pawan.ve...@hotwaxsystems.com> > wrote: > > > Hi Chandan, Jacques > > > > Thanks, for your feedback. > > > > Yes, To solve the problem of heavy entity operations which consumes all > the > > system memory, we have implemented EntityBatchIterator. Originally > designed > > for the heavy entity operations. > > -- > > Thanks & Regards > > Pawan Verma > > Technical Consultant > > *HotWax Systems* > > *Enterprise open source experts* > > http://www.hotwaxsystems.com > > > > > > On Sun, Jun 28, 2020 at 10:16 PM Jacques Le Roux < > > jacques.le.r...@les7arts.com> wrote: > > > > > Hi, > > > > > > I have not looked into any details but Chandan's advice sounds like a > > wise > > > one to me > > > > > > Jacques > > > > > > Le 27/06/2020 à 13:43, Chandan Khandelwal a écrit : > > > > Hello Pawan, > > > > > > > > Approach looks good, my only suggestion is to use batch processing > only > > > > when we are dealing with large data set, as this method takes a > longer > > > time > > > > compared to the normal method specially on a distributed environment, > > > which > > > > may negatively impact the performance. > > > > > > > > Kind Regards, > > > > Chandan Khandelwal > > > > Senior Manager, Enterprise Software Development > > > > > > > > *HotWax Systems* > > > > *Enterprise open source experts* > > > > cell: +91-98934-81076 > > > > office: 0731-409-3684 > > > > http://www.hotwaxsystems.com > > > > > > > > > > > > On Fri, Jun 5, 2020 at 4:07 PM Pawan Verma < > > > pawan.ve...@hotwaxsystems.com> > > > > wrote: > > > > > > > >> Thanks, Pritam and Scott for the discussion. > > > >> > > > >> I've created Jira OFBIZ-11789 for this improvement and also created > a > > PR > > > >> with the proposed changes. > > > >> > > > >> I request everyone to review the PR and suggest your thought on > this. > > > >> Thanks! > > > >> -- > > > >> Thanks & Regards > > > >> Pawan Verma > > > >> Technical Consultant > > > >> *HotWax Systems* > > > >> *Enterprise open source experts* > > > >> http://www.hotwaxsystems.com > > > >> > > > >> > > > >> On Mon, Jun 1, 2020 at 12:36 PM Pritam Kute < > > > pritam.k...@hotwaxsystems.com > > > >> wrote: > > > >> > > > >>> Thanks Scott for your detailed explanation. > > > >>> > > > >>> The solution looks good to me too. My confusion was with why we are > > > going > > > >>> to implement new method if we can achieve that using the current > > > >>> EntityQuery methods. > > > >>> > > > >>> +1 for adding queryBatchIterator() to EntityQuery. > > > >>> > > > >>> Kind Regards, > > > >>> -- > > > >>> Pritam Kute > > > >>> > > > >>> > > > >>> On Thu, May 28, 2020 at 6:32 AM Scott Gray < > > > scott.g...@hotwaxsystems.com > > > >>> > > > >>> wrote: > > > >>> > > > >>>> Hi Pritam, > > > >>>> > > > >>>> I'm not sure about PostgreSQL or Derby but I know with MySQL that > > > >> using a > > > >>>> cursor doesn't really work. You have to set the result set to > > > >>>> TYPE_FORWARD_ONLY and CONCUR_READ_ONLY and also set the fetch size > > to > > > >>>> INTEGER.MIN_VALUE. Only then will the driver stream the results > and > > > >> even > > > >>>> then, you may not execute any other SQL commands on the connection > > > >> until > > > >>>> you have fully read or closed the resultset. > > > >>>> > > > >>>> So if an EntityListIterator doesn't really conserve memory, then > you > > > >> need > > > >>>> to take a paging query approach such as this: > > > >>>> EntityQuery query = > > > >>> EntityQuery.use(delegator).from("SomeTable").limit(100) > > > >>>> List results = null > > > >>>> while (!(results = query.queryList()).isEmpty()) { > > > >>>> for (value : results) { > > > >>>> // do something with each value > > > >>>> } > > > >>>> query.offset(query.getOffset() + query.getLimit()) > > > >>>> } > > > >>>> > > > >>>> Or with the proposed EntityBatchIterator: > > > >>>> Iterator query = > > > >>>> > > > >>>> > > > >> > > > > > > EntityQuery.use(delegator).from("SomeTable").limit(100).queryBatchIterator() > > > >>>> while (iterator.hasNext()) { > > > >>>> result = iterator.next() > > > >>>> // do something with each value > > > >>>> } > > > >>>> > > > >>>> I guess an alternative approach would be to implement something > > > similar > > > >>>> within the EntityListIterator and perhaps a flag to turn it off or > > on > > > >>>> depending on which database is being used and how well it supports > > > >>>> iterating over results without loading the entire resultset into > > > >> memory. > > > >>>> Regards > > > >>>> Scott > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Sat, 23 May 2020 at 20:59, Pritam Kute < > > > >> pritam.k...@hotwaxsystems.com > > > >>>> wrote: > > > >>>> > > > >>>>> Hello Pawan, > > > >>>>> > > > >>>>> I just had a look into the EntityQuery.queryIterator() method and > > > >> looks > > > >>>>> like we can achieve that by using fetchSize(), fowardOnly(), > > > >>>>> cursorScrollInsensitive(), cursorScrollSensitive() and offset() > > > >> methods > > > >>>> in > > > >>>>> EntityQuery class. Let me know if I am missing anything. > > > >>>>> > > > >>>>> It will be good if you can post a pseudo code or something here > so > > > >> that > > > >>>> we > > > >>>>> could get an understanding of the exact design which you have in > > your > > > >>>> mind. > > > >>>>> Kind Regards, > > > >>>>> -- > > > >>>>> Pritam Kute > > > >>>>> > > > >>>>> > > > >>>>> On Thu, May 21, 2020 at 7:41 PM Pawan Verma < > > > >>>> pawan.ve...@hotwaxsystems.com > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hello Devs, > > > >>>>>> > > > >>>>>> While working on the large database we have figured out that > very > > > >>> large > > > >>>>>> queries consume all memory and crash ofbiz(because > queryIterator() > > > >>>>> doesn't > > > >>>>>> really work, it's no different from queryList()) > > > >>>>>> > > > >>>>>> The EntityListIterator attempts to use a cursor to iterate over > > > >> large > > > >>>>>> result sets but in reality most databases do not give us a > cursor > > > >>>> unless > > > >>>>> we > > > >>>>>> ask for it in a very specific way, and instead you get back the > > > >> full > > > >>>>> result > > > >>>>>> set and potentially consume a large amount of memory. For > > example, > > > >>> the > > > >>>>>> MySql details are here (ResultSet section): > > > >>>>>> > > > >>>>>> > > > >> > > > > > > https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > > > >>>>>> To work around this we've built the EntityBatchIterator which > > > >> instead > > > >>>>>> basically just pages through results like you would do on a > > webpage > > > >>>> that > > > >>>>>> uses pagination. The iterator takes an EntityQuery and when > > next() > > > >>> is > > > >>>>>> called it grabs the first 500 records and returns the first > > record, > > > >>>> call > > > >>>>>> next again and you get the second record, once you've called > next > > > >> 500 > > > >>>>> times > > > >>>>>> it runs the query again with an offset and grabs the next 500 > and > > > >> so > > > >>> on > > > >>>>>> until there are no more records. > > > >>>>>> > > > >>>>>> The main downsides to this approach are: > > > >>>>>> 1. Same as when using limit/offset you want to be sure your > > results > > > >>>> will > > > >>>>>> come back in a consistent order to avoid accidentally skipping > > some > > > >>>> rows > > > >>>>>> and seeing other rows twice. > > > >>>>>> 2. Because the results are a combination of many queries rather > > > >> than > > > >>> a > > > >>>>>> single query, some of the data may change while you are paging > > > >>> through > > > >>>>> it. > > > >>>>>> i.e. if you were to sort by lastUpdatedTxStamp you may see some > > > >> rows > > > >>>>> twice > > > >>>>>> as they are updated by other transactions (this might be a good > > > >> thing > > > >>>> in > > > >>>>>> some cases). > > > >>>>>> > > > >>>>>> So, the main proposal is to introduce a new EntityBatchIterator > > for > > > >>>> large > > > >>>>>> queries. If we agree on the proposal I'll create a Jira and > > > >> provide > > > >>> a > > > >>>>>> patch for the community review. Thanks! > > > >>>>>> > > > >>>>>> -- > > > >>>>>> Thanks & Regards > > > >>>>>> Pawan Verma > > > >>>>>> Technical Consultant > > > >>>>>> *HotWax Systems* > > > >>>>>> *Enterprise open source experts* > > > >>>>>> http://www.hotwaxsystems.com > > > >>>>>> > > > > > > > > -- > Kind Regards, > Prakhar Kumar > Sr. Enterprise Software Engineer > > *HotWax Systems* > *Enterprise open source experts* > cell: +91-89628-81820 > office: 0731-409-3684 > http://www.hotwaxsystems.com >