Thanks, Pritam and Scott for the discussion. I've created Jira OFBIZ-11789 for this improvement and also created a PR with the proposed changes.
I request everyone to review the PR and suggest your thought on this. Thanks! -- Thanks & Regards Pawan Verma Technical Consultant *HotWax Systems* *Enterprise open source experts* http://www.hotwaxsystems.com On Mon, Jun 1, 2020 at 12:36 PM Pritam Kute <pritam.k...@hotwaxsystems.com> wrote: > Thanks Scott for your detailed explanation. > > The solution looks good to me too. My confusion was with why we are going > to implement new method if we can achieve that using the current > EntityQuery methods. > > +1 for adding queryBatchIterator() to EntityQuery. > > Kind Regards, > -- > Pritam Kute > > > On Thu, May 28, 2020 at 6:32 AM Scott Gray <scott.g...@hotwaxsystems.com> > wrote: > > > Hi Pritam, > > > > I'm not sure about PostgreSQL or Derby but I know with MySQL that using a > > cursor doesn't really work. You have to set the result set to > > TYPE_FORWARD_ONLY and CONCUR_READ_ONLY and also set the fetch size to > > INTEGER.MIN_VALUE. Only then will the driver stream the results and even > > then, you may not execute any other SQL commands on the connection until > > you have fully read or closed the resultset. > > > > So if an EntityListIterator doesn't really conserve memory, then you need > > to take a paging query approach such as this: > > EntityQuery query = > EntityQuery.use(delegator).from("SomeTable").limit(100) > > List results = null > > while (!(results = query.queryList()).isEmpty()) { > > for (value : results) { > > // do something with each value > > } > > query.offset(query.getOffset() + query.getLimit()) > > } > > > > Or with the proposed EntityBatchIterator: > > Iterator query = > > > > > EntityQuery.use(delegator).from("SomeTable").limit(100).queryBatchIterator() > > while (iterator.hasNext()) { > > result = iterator.next() > > // do something with each value > > } > > > > I guess an alternative approach would be to implement something similar > > within the EntityListIterator and perhaps a flag to turn it off or on > > depending on which database is being used and how well it supports > > iterating over results without loading the entire resultset into memory. > > > > Regards > > Scott > > > > > > > > On Sat, 23 May 2020 at 20:59, Pritam Kute <pritam.k...@hotwaxsystems.com > > > > wrote: > > > > > Hello Pawan, > > > > > > I just had a look into the EntityQuery.queryIterator() method and looks > > > like we can achieve that by using fetchSize(), fowardOnly(), > > > cursorScrollInsensitive(), cursorScrollSensitive() and offset() methods > > in > > > EntityQuery class. Let me know if I am missing anything. > > > > > > It will be good if you can post a pseudo code or something here so that > > we > > > could get an understanding of the exact design which you have in your > > mind. > > > > > > Kind Regards, > > > -- > > > Pritam Kute > > > > > > > > > On Thu, May 21, 2020 at 7:41 PM Pawan Verma < > > pawan.ve...@hotwaxsystems.com > > > > > > > wrote: > > > > > > > Hello Devs, > > > > > > > > While working on the large database we have figured out that very > large > > > > queries consume all memory and crash ofbiz(because queryIterator() > > > doesn't > > > > really work, it's no different from queryList()) > > > > > > > > The EntityListIterator attempts to use a cursor to iterate over large > > > > result sets but in reality most databases do not give us a cursor > > unless > > > we > > > > ask for it in a very specific way, and instead you get back the full > > > result > > > > set and potentially consume a large amount of memory. For example, > the > > > > MySql details are here (ResultSet section): > > > > > > > > > > > > > > https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > > > > > > > > To work around this we've built the EntityBatchIterator which instead > > > > basically just pages through results like you would do on a webpage > > that > > > > uses pagination. The iterator takes an EntityQuery and when next() > is > > > > called it grabs the first 500 records and returns the first record, > > call > > > > next again and you get the second record, once you've called next 500 > > > times > > > > it runs the query again with an offset and grabs the next 500 and so > on > > > > until there are no more records. > > > > > > > > The main downsides to this approach are: > > > > 1. Same as when using limit/offset you want to be sure your results > > will > > > > come back in a consistent order to avoid accidentally skipping some > > rows > > > > and seeing other rows twice. > > > > 2. Because the results are a combination of many queries rather than > a > > > > single query, some of the data may change while you are paging > through > > > it. > > > > i.e. if you were to sort by lastUpdatedTxStamp you may see some rows > > > twice > > > > as they are updated by other transactions (this might be a good thing > > in > > > > some cases). > > > > > > > > So, the main proposal is to introduce a new EntityBatchIterator for > > large > > > > queries. If we agree on the proposal I'll create a Jira and provide > a > > > > patch for the community review. Thanks! > > > > > > > > -- > > > > Thanks & Regards > > > > Pawan Verma > > > > Technical Consultant > > > > *HotWax Systems* > > > > *Enterprise open source experts* > > > > http://www.hotwaxsystems.com > > > > > > > > > >