Pawan Verma created OFBIZ-11789:
-----------------------------------

             Summary: Implement EntityBatchIterator for large data set queries
                 Key: OFBIZ-11789
                 URL: https://issues.apache.org/jira/browse/OFBIZ-11789
             Project: OFBiz
          Issue Type: Improvement
          Components: framework/entity
    Affects Versions: Trunk
            Reporter: Pawan Verma
            Assignee: Pawan Verma


Dev Mail Thread: 
[https://markmail.org/search/EntityBatchIterator+for+large+data+set+queries]

While working on the large database we have figured out that very large queries 
consume all memory and crash ofbiz(because queryIterator() doesn't really work, 
it's no different from queryList())

The EntityListIterator attempts to use a cursor to iterate over large result 
sets but in reality most databases do not give us a cursor unless we ask for it 
in a very specific way, and instead you get back the full result set and 
potentially consume a large amount of memory.  For example, the MySql details 
are here (ResultSet section): 
[https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html]

To work around this we've built the EntityBatchIterator which instead basically 
just pages through results like you would do on a webpage that uses pagination. 
 The iterator takes an EntityQuery and when next() is called it grabs the first 
500 records and returns the first record, call next again and you get the 
second record, once you've called next 500 times it runs the query again with 
an offset and grabs the next 500 and so on until there are no more records.

The main downsides to this approach are:
1. Same as when using limit/offset you want to be sure your results will come 
back in a consistent order to avoid accidentally skipping some rows and seeing 
other rows twice.
2. Because the results are a combination of many queries rather than a single 
query, some of the data may change while you are paging through it. i.e. if you 
were to sort by lastUpdatedTxStamp you may see some rows twice as they are 
updated by other transactions (this might be a good thing in some cases).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to