I'm not putting the result set into memory. In fact I never get to the point after I say executeQuery...
It must be that H2 is trying to do the aggregations (grouping and count) in memory. I can understand that this will be the best approach for datasets with lots of duplicates, but in my case I'm expecting just a few percent duplicates. And in that case it runs will surely run out of memory if it's doing all the aggregations in memory. Anyone else has a suggestion? I can read from H2's website that it implements an on-disk sorting mechanism so might be that I have to just do an ORDER BY and do the unique-counting myself by checking if the current record is equal to the previous. But this is not my preferred solution :-( Kasper On Wed, Feb 2, 2011 at 4:44 PM, Ryan How <[email protected]> wrote: > Is the out of memory error just because you are reading a massive result > set? > > Then wouldn't it need server side cursors to be able to read it without > putting the whole result set into memory? > > Maybe you can try and break it into lots of smaller queries using limit and > offset?. You might be able to speed that up then using a temporary table to > store your duplicate count, then querying the temporary table with the limit > and offset to do it in chunks. > > I'm sure someone smarter that me will give you a better answer :). > > Cheers, Ryan > > > > > On 2/02/2011 11:37 PM, Ryan How wrote: > > Sorry, I misinterpreted what you were trying to do. > > You are right, it wouldn't work :) > > > > On 2/02/2011 10:56 PM, Kasper Sørensen wrote: > > Can you actually do that? I need all the values also, to see which are the > duplicates, so maybe something like: > > SELECT COUNT DISTINCT a,b,c,d,e,f FROM my_table > > but to my knowledge that is not a valid query? > > /Kasper > > > On Wed, Feb 2, 2011 at 3:43 PM, Ryan How <[email protected]> wrote: > >> Hi, >> >> What about >> >> SELECT COUNT DISTINCT FROM my_table >> >> I know it doesn't solve the actual issue, but it may be a workaround for >> now? >> >> Cheers, Ryan >> >> >> >> On 2/02/2011 7:09 PM, Kasper Sørensen wrote: >> >>> SELECT a,b,c,d,e,f,g,COUNT(*) FROM my_table GROUP BY a,b,c,d,e,f,g >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "H2 Database" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<h2-database%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/h2-database?hl=en. >> >> > -- > You received this message because you are subscribed to the Google Groups > "H2 Database" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/h2-database?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "H2 Database" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/h2-database?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "H2 Database" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<h2-database%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/h2-database?hl=en. > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
