[jira] [Commented] (OFBIZ-11789) Implement EntityBatchIterator for large data set queries

Pawan Verma (Jira) Fri, 05 Jun 2020 03:35:13 -0700


    [ 
https://issues.apache.org/jira/browse/OFBIZ-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126641#comment-17126641
 ]


Pawan Verma commented on OFBIZ-11789:
-------------------------------------

Implemented: EntityBatchIterator which pages through large resultsets (500 at a 
time by default) and added support for EntityBatchIterator to EntityQuery via 
'queryBatchIterator'(OFBIZ-11789)

Modified GenericDao#makeOffsetString to have OFFSET/FETCH style by default
Modified entityengine's MySQL and Postgres data sources to have LIMIT/OFFSET 
style, rest can work with default OFFSET/FETCH style

PR Link: [https://github.com/apache/ofbiz-framework/pull/191]

Thanks: [~lektran] for the help.

 

This can be tested using EntityQuery#queryBatchIterator() method for any entity 
that has large data. See the logs to verify its working. 

 

 

> Implement EntityBatchIterator for large data set queries
> --------------------------------------------------------
>
>                 Key: OFBIZ-11789
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-11789
>             Project: OFBiz
>          Issue Type: Improvement
>          Components: framework/entity
>    Affects Versions: Trunk
>            Reporter: Pawan Verma
>            Assignee: Pawan Verma
>            Priority: Minor
>
> Dev Mail Thread: 
> [https://markmail.org/search/EntityBatchIterator+for+large+data+set+queries]
> While working on the large database we have figured out that very large 
> queries consume all memory and crash ofbiz(because queryIterator() doesn't 
> really work, it's no different from queryList())
> The EntityListIterator attempts to use a cursor to iterate over large result 
> sets but in reality most databases do not give us a cursor unless we ask for 
> it in a very specific way, and instead you get back the full result set and 
> potentially consume a large amount of memory.  For example, the MySql details 
> are here (ResultSet section): 
> [https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html]
> To work around this we've built the EntityBatchIterator which instead 
> basically just pages through results like you would do on a webpage that uses 
> pagination.  The iterator takes an EntityQuery and when next() is called it 
> grabs the first 500 records and returns the first record, call next again and 
> you get the second record, once you've called next 500 times it runs the 
> query again with an offset and grabs the next 500 and so on until there are 
> no more records.
> The main downsides to this approach are:
> 1. Same as when using limit/offset you want to be sure your results will come 
> back in a consistent order to avoid accidentally skipping some rows and 
> seeing other rows twice.
> 2. Because the results are a combination of many queries rather than a single 
> query, some of the data may change while you are paging through it. i.e. if 
> you were to sort by lastUpdatedTxStamp you may see some rows twice as they 
> are updated by other transactions (this might be a good thing in some cases).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OFBIZ-11789) Implement EntityBatchIterator for large data set queries

Reply via email to