Thanks Scott for your detailed explanation.

The solution looks good to me too. My confusion was with why we are going
to implement new method if we can achieve that using the current
EntityQuery methods.

+1 for adding queryBatchIterator() to EntityQuery.

Kind Regards,
--
Pritam Kute


On Thu, May 28, 2020 at 6:32 AM Scott Gray <scott.g...@hotwaxsystems.com>
wrote:

> Hi Pritam,
>
> I'm not sure about PostgreSQL or Derby but I know with MySQL that using a
> cursor doesn't really work.  You have to set the result set to
> TYPE_FORWARD_ONLY and CONCUR_READ_ONLY and also set the fetch size to
> INTEGER.MIN_VALUE.  Only then will the driver stream the results and even
> then, you may not execute any other SQL commands on the connection until
> you have fully read or closed the resultset.
>
> So if an EntityListIterator doesn't really conserve memory, then you need
> to take a paging query approach such as this:
> EntityQuery query = EntityQuery.use(delegator).from("SomeTable").limit(100)
> List results = null
> while (!(results = query.queryList()).isEmpty()) {
>  for (value : results) {
>   // do something with each value
>  }
>  query.offset(query.getOffset() + query.getLimit())
> }
>
> Or with the proposed EntityBatchIterator:
> Iterator query =
>
> EntityQuery.use(delegator).from("SomeTable").limit(100).queryBatchIterator()
> while (iterator.hasNext()) {
>  result = iterator.next()
>  // do something with each value
> }
>
> I guess an alternative approach would be to implement something similar
> within the EntityListIterator and perhaps a flag to turn it off or on
> depending on which database is being used and how well it supports
> iterating over results without loading the entire resultset into memory.
>
> Regards
> Scott
>
>
>
> On Sat, 23 May 2020 at 20:59, Pritam Kute <pritam.k...@hotwaxsystems.com>
> wrote:
>
> > Hello Pawan,
> >
> > I just had a look into the EntityQuery.queryIterator() method and looks
> > like we can achieve that by using fetchSize(), fowardOnly(),
> > cursorScrollInsensitive(), cursorScrollSensitive() and offset() methods
> in
> > EntityQuery class. Let me know if I am missing anything.
> >
> > It will be good if you can post a pseudo code or something here so that
> we
> > could get an understanding of the exact design which you have in your
> mind.
> >
> > Kind Regards,
> > --
> > Pritam Kute
> >
> >
> > On Thu, May 21, 2020 at 7:41 PM Pawan Verma <
> pawan.ve...@hotwaxsystems.com
> > >
> > wrote:
> >
> > > Hello Devs,
> > >
> > > While working on the large database we have figured out that very large
> > > queries consume all memory and crash ofbiz(because queryIterator()
> > doesn't
> > > really work, it's no different from queryList())
> > >
> > > The EntityListIterator attempts to use a cursor to iterate over large
> > > result sets but in reality most databases do not give us a cursor
> unless
> > we
> > > ask for it in a very specific way, and instead you get back the full
> > result
> > > set and potentially consume a large amount of memory.  For example, the
> > > MySql details are here (ResultSet section):
> > >
> > >
> >
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
> > >
> > > To work around this we've built the EntityBatchIterator which instead
> > > basically just pages through results like you would do on a webpage
> that
> > > uses pagination.  The iterator takes an EntityQuery and when next() is
> > > called it grabs the first 500 records and returns the first record,
> call
> > > next again and you get the second record, once you've called next 500
> > times
> > > it runs the query again with an offset and grabs the next 500 and so on
> > > until there are no more records.
> > >
> > > The main downsides to this approach are:
> > > 1. Same as when using limit/offset you want to be sure your results
> will
> > > come back in a consistent order to avoid accidentally skipping some
> rows
> > > and seeing other rows twice.
> > > 2. Because the results are a combination of many queries rather than a
> > > single query, some of the data may change while you are paging through
> > it.
> > > i.e. if you were to sort by lastUpdatedTxStamp you may see some rows
> > twice
> > > as they are updated by other transactions (this might be a good thing
> in
> > > some cases).
> > >
> > > So, the main proposal is to introduce a new EntityBatchIterator for
> large
> > > queries.  If we agree on the proposal I'll create a Jira and provide a
> > > patch for the community review. Thanks!
> > >
> > > --
> > > Thanks & Regards
> > > Pawan Verma
> > > Technical Consultant
> > > *HotWax Systems*
> > > *Enterprise open source experts*
> > > http://www.hotwaxsystems.com
> > >
> >
>

Reply via email to