Hello Pawan, I just had a look into the EntityQuery.queryIterator() method and looks like we can achieve that by using fetchSize(), fowardOnly(), cursorScrollInsensitive(), cursorScrollSensitive() and offset() methods in EntityQuery class. Let me know if I am missing anything.
It will be good if you can post a pseudo code or something here so that we could get an understanding of the exact design which you have in your mind. Kind Regards, -- Pritam Kute On Thu, May 21, 2020 at 7:41 PM Pawan Verma <pawan.ve...@hotwaxsystems.com> wrote: > Hello Devs, > > While working on the large database we have figured out that very large > queries consume all memory and crash ofbiz(because queryIterator() doesn't > really work, it's no different from queryList()) > > The EntityListIterator attempts to use a cursor to iterate over large > result sets but in reality most databases do not give us a cursor unless we > ask for it in a very specific way, and instead you get back the full result > set and potentially consume a large amount of memory. For example, the > MySql details are here (ResultSet section): > > https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > > To work around this we've built the EntityBatchIterator which instead > basically just pages through results like you would do on a webpage that > uses pagination. The iterator takes an EntityQuery and when next() is > called it grabs the first 500 records and returns the first record, call > next again and you get the second record, once you've called next 500 times > it runs the query again with an offset and grabs the next 500 and so on > until there are no more records. > > The main downsides to this approach are: > 1. Same as when using limit/offset you want to be sure your results will > come back in a consistent order to avoid accidentally skipping some rows > and seeing other rows twice. > 2. Because the results are a combination of many queries rather than a > single query, some of the data may change while you are paging through it. > i.e. if you were to sort by lastUpdatedTxStamp you may see some rows twice > as they are updated by other transactions (this might be a good thing in > some cases). > > So, the main proposal is to introduce a new EntityBatchIterator for large > queries. If we agree on the proposal I'll create a Jira and provide a > patch for the community review. Thanks! > > -- > Thanks & Regards > Pawan Verma > Technical Consultant > *HotWax Systems* > *Enterprise open source experts* > http://www.hotwaxsystems.com >