I'm not putting the result set into memory. In fact I never get to the point
after I say executeQuery...

It must be that H2 is trying to do the aggregations (grouping and count) in
memory. I can understand that this will be the best approach for datasets
with lots of duplicates, but in my case I'm expecting just a few percent
duplicates. And in that case it runs will surely run out of memory if it's
doing all the aggregations in memory.

Anyone else has a suggestion? I can read from H2's website that it
implements an on-disk sorting mechanism so might be that I have to just do
an ORDER BY and do the unique-counting myself by checking if the current
record is equal to the previous. But this is not my preferred solution :-(

Kasper

On Wed, Feb 2, 2011 at 4:44 PM, Ryan How <[email protected]> wrote:

>  Is the out of memory error just because you are reading a massive result
> set?
>
> Then wouldn't it need server side cursors to be able to read it without
> putting the whole result set into memory?
>
> Maybe you can try and break it into lots of smaller queries using limit and
> offset?. You might be able to speed that up then using a temporary table to
> store your duplicate count, then querying the temporary table with the limit
> and offset to do it in chunks.
>
> I'm sure someone smarter that me will give you a better answer :).
>
> Cheers, Ryan
>
>
>
>
> On 2/02/2011 11:37 PM, Ryan How wrote:
>
> Sorry, I misinterpreted what you were trying to do.
>
> You are right, it wouldn't work :)
>
>
>
> On 2/02/2011 10:56 PM, Kasper Sørensen wrote:
>
> Can you actually do that? I need all the values also, to see which are the
> duplicates, so maybe something like:
>
> SELECT COUNT DISTINCT a,b,c,d,e,f FROM my_table
>
> but to my knowledge that is not a valid query?
>
> /Kasper
>
>
> On Wed, Feb 2, 2011 at 3:43 PM, Ryan How <[email protected]> wrote:
>
>> Hi,
>>
>> What about
>>
>> SELECT COUNT DISTINCT FROM my_table
>>
>> I know it doesn't solve the actual issue, but it may be a workaround for
>> now?
>>
>> Cheers, Ryan
>>
>>
>>
>> On 2/02/2011 7:09 PM, Kasper Sørensen wrote:
>>
>>> SELECT a,b,c,d,e,f,g,COUNT(*) FROM my_table GROUP BY a,b,c,d,e,f,g
>>>
>>>
>>   --
>> You received this message because you are subscribed to the Google Groups
>> "H2 Database" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<h2-database%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/h2-database?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<h2-database%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/h2-database?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Reply via email to