[
https://issues.apache.org/jira/browse/OFBIZ-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126641#comment-17126641
]
Pawan Verma commented on OFBIZ-11789:
-------------------------------------
Implemented: EntityBatchIterator which pages through large resultsets (500 at a
time by default) and added support for EntityBatchIterator to EntityQuery via
'queryBatchIterator'(OFBIZ-11789)
Modified GenericDao#makeOffsetString to have OFFSET/FETCH style by default
Modified entityengine's MySQL and Postgres data sources to have LIMIT/OFFSET
style, rest can work with default OFFSET/FETCH style
PR Link: [https://github.com/apache/ofbiz-framework/pull/191]
Thanks: [~lektran] for the help.
This can be tested using EntityQuery#queryBatchIterator() method for any entity
that has large data. See the logs to verify its working.
> Implement EntityBatchIterator for large data set queries
> --------------------------------------------------------
>
> Key: OFBIZ-11789
> URL: https://issues.apache.org/jira/browse/OFBIZ-11789
> Project: OFBiz
> Issue Type: Improvement
> Components: framework/entity
> Affects Versions: Trunk
> Reporter: Pawan Verma
> Assignee: Pawan Verma
> Priority: Minor
>
> Dev Mail Thread:
> [https://markmail.org/search/EntityBatchIterator+for+large+data+set+queries]
> While working on the large database we have figured out that very large
> queries consume all memory and crash ofbiz(because queryIterator() doesn't
> really work, it's no different from queryList())
> The EntityListIterator attempts to use a cursor to iterate over large result
> sets but in reality most databases do not give us a cursor unless we ask for
> it in a very specific way, and instead you get back the full result set and
> potentially consume a large amount of memory. For example, the MySql details
> are here (ResultSet section):
> [https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html]
> To work around this we've built the EntityBatchIterator which instead
> basically just pages through results like you would do on a webpage that uses
> pagination. The iterator takes an EntityQuery and when next() is called it
> grabs the first 500 records and returns the first record, call next again and
> you get the second record, once you've called next 500 times it runs the
> query again with an offset and grabs the next 500 and so on until there are
> no more records.
> The main downsides to this approach are:
> 1. Same as when using limit/offset you want to be sure your results will come
> back in a consistent order to avoid accidentally skipping some rows and
> seeing other rows twice.
> 2. Because the results are a combination of many queries rather than a single
> query, some of the data may change while you are paging through it. i.e. if
> you were to sort by lastUpdatedTxStamp you may see some rows twice as they
> are updated by other transactions (this might be a good thing in some cases).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)