Re: EntityBatchIterator for large data set queries

Pawan Verma Fri, 05 Jun 2020 03:38:27 -0700

Thanks, Pritam and Scott for the discussion.

I've created Jira OFBIZ-11789 for this improvement and also created a PR
with the proposed changes.


I request everyone to review the PR and suggest your thought on this.
Thanks!
-- 
Thanks & Regards
Pawan Verma
Technical Consultant
*HotWax Systems*
*Enterprise open source experts*
http://www.hotwaxsystems.com


On Mon, Jun 1, 2020 at 12:36 PM Pritam Kute <pritam.k...@hotwaxsystems.com>
wrote:

> Thanks Scott for your detailed explanation.
>
> The solution looks good to me too. My confusion was with why we are going
> to implement new method if we can achieve that using the current
> EntityQuery methods.
>
> +1 for adding queryBatchIterator() to EntityQuery.
>
> Kind Regards,
> --
> Pritam Kute
>
>
> On Thu, May 28, 2020 at 6:32 AM Scott Gray <scott.g...@hotwaxsystems.com>
> wrote:
>
> > Hi Pritam,
> >
> > I'm not sure about PostgreSQL or Derby but I know with MySQL that using a
> > cursor doesn't really work.  You have to set the result set to
> > TYPE_FORWARD_ONLY and CONCUR_READ_ONLY and also set the fetch size to
> > INTEGER.MIN_VALUE.  Only then will the driver stream the results and even
> > then, you may not execute any other SQL commands on the connection until
> > you have fully read or closed the resultset.
> >
> > So if an EntityListIterator doesn't really conserve memory, then you need
> > to take a paging query approach such as this:
> > EntityQuery query =
> EntityQuery.use(delegator).from("SomeTable").limit(100)
> > List results = null
> > while (!(results = query.queryList()).isEmpty()) {
> >  for (value : results) {
> >   // do something with each value
> >  }
> >  query.offset(query.getOffset() + query.getLimit())
> > }
> >
> > Or with the proposed EntityBatchIterator:
> > Iterator query =
> >
> >
> EntityQuery.use(delegator).from("SomeTable").limit(100).queryBatchIterator()
> > while (iterator.hasNext()) {
> >  result = iterator.next()
> >  // do something with each value
> > }
> >
> > I guess an alternative approach would be to implement something similar
> > within the EntityListIterator and perhaps a flag to turn it off or on
> > depending on which database is being used and how well it supports
> > iterating over results without loading the entire resultset into memory.
> >
> > Regards
> > Scott
> >
> >
> >
> > On Sat, 23 May 2020 at 20:59, Pritam Kute <pritam.k...@hotwaxsystems.com
> >
> > wrote:
> >
> > > Hello Pawan,
> > >
> > > I just had a look into the EntityQuery.queryIterator() method and looks
> > > like we can achieve that by using fetchSize(), fowardOnly(),
> > > cursorScrollInsensitive(), cursorScrollSensitive() and offset() methods
> > in
> > > EntityQuery class. Let me know if I am missing anything.
> > >
> > > It will be good if you can post a pseudo code or something here so that
> > we
> > > could get an understanding of the exact design which you have in your
> > mind.
> > >
> > > Kind Regards,
> > > --
> > > Pritam Kute
> > >
> > >
> > > On Thu, May 21, 2020 at 7:41 PM Pawan Verma <
> > pawan.ve...@hotwaxsystems.com
> > > >
> > > wrote:
> > >
> > > > Hello Devs,
> > > >
> > > > While working on the large database we have figured out that very
> large
> > > > queries consume all memory and crash ofbiz(because queryIterator()
> > > doesn't
> > > > really work, it's no different from queryList())
> > > >
> > > > The EntityListIterator attempts to use a cursor to iterate over large
> > > > result sets but in reality most databases do not give us a cursor
> > unless
> > > we
> > > > ask for it in a very specific way, and instead you get back the full
> > > result
> > > > set and potentially consume a large amount of memory.  For example,
> the
> > > > MySql details are here (ResultSet section):
> > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
> > > >
> > > > To work around this we've built the EntityBatchIterator which instead
> > > > basically just pages through results like you would do on a webpage
> > that
> > > > uses pagination.  The iterator takes an EntityQuery and when next()
> is
> > > > called it grabs the first 500 records and returns the first record,
> > call
> > > > next again and you get the second record, once you've called next 500
> > > times
> > > > it runs the query again with an offset and grabs the next 500 and so
> on
> > > > until there are no more records.
> > > >
> > > > The main downsides to this approach are:
> > > > 1. Same as when using limit/offset you want to be sure your results
> > will
> > > > come back in a consistent order to avoid accidentally skipping some
> > rows
> > > > and seeing other rows twice.
> > > > 2. Because the results are a combination of many queries rather than
> a
> > > > single query, some of the data may change while you are paging
> through
> > > it.
> > > > i.e. if you were to sort by lastUpdatedTxStamp you may see some rows
> > > twice
> > > > as they are updated by other transactions (this might be a good thing
> > in
> > > > some cases).
> > > >
> > > > So, the main proposal is to introduce a new EntityBatchIterator for
> > large
> > > > queries.  If we agree on the proposal I'll create a Jira and provide
> a
> > > > patch for the community review. Thanks!
> > > >
> > > > --
> > > > Thanks & Regards
> > > > Pawan Verma
> > > > Technical Consultant
> > > > *HotWax Systems*
> > > > *Enterprise open source experts*
> > > > http://www.hotwaxsystems.com
> > > >
> > >
> >
>

Re: EntityBatchIterator for large data set queries

Reply via email to