EntityBatchIterator for large data set queries

Pawan Verma Thu, 21 May 2020 07:12:10 -0700

Hello Devs,

While working on the large database we have figured out that very large
queries consume all memory and crash ofbiz(because queryIterator() doesn't
really work, it's no different from queryList())

The EntityListIterator attempts to use a cursor to iterate over large
result sets but in reality most databases do not give us a cursor unless we
ask for it in a very specific way, and instead you get back the full result
set and potentially consume a large amount of memory. For example, the
MySql details are here (ResultSet section):
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

To work around this we've built the EntityBatchIterator which instead
basically just pages through results like you would do on a webpage that
uses pagination. The iterator takes an EntityQuery and when next() is
called it grabs the first 500 records and returns the first record, call
next again and you get the second record, once you've called next 500 times
it runs the query again with an offset and grabs the next 500 and so on
until there are no more records.

The main downsides to this approach are:
1. Same as when using limit/offset you want to be sure your results will
come back in a consistent order to avoid accidentally skipping some rows
and seeing other rows twice.
2. Because the results are a combination of many queries rather than a
single query, some of the data may change while you are paging through it.
i.e. if you were to sort by lastUpdatedTxStamp you may see some rows twice
as they are updated by other transactions (this might be a good thing in
some cases).

So, the main proposal is to introduce a new EntityBatchIterator for large
queries. If we agree on the proposal I'll create a Jira and provide a
patch for the community review. Thanks!

--
Thanks & Regards
Pawan Verma
Technical Consultant
*HotWax Systems*
*Enterprise open source experts*
http://www.hotwaxsystems.com

EntityBatchIterator for large data set queries

Reply via email to