[Forwarding to the dev list. Stéphane, please forward your replies here as well or subscribe to dev if you feel like it]

Stéphane, thanks for providing this information. It is encouraging.

The mismatch in the results between you and Andrey is exactly why I wanted to do real profiling, as we need to see how the memory allocation changes in the process. OutOfMemory may or may not happen depending on many factors. What we care about here is the dynamics of ResultSet reading.

I see that Andrey has already committed this code to trunk, so now we have something to play with. If you don't mind using a trunk build (doing it yourself or grabbing one from Hudson), you can give us the feedback on the implementation.

Andrus


Begin forwarded message:
From: Stéphane Claret <[email protected]>
Date: May 25, 2009 11:58:52 AM GMT+03:00
To: Andrus Adamchik <[email protected]>
Subject: Re: Big selects on PostGres : Configuring Statement.setFetchSize() of a selectquery

Hello,

To answer, setting fetchSize with postgres does make a difference.
If you consider the following code copy-pasted from my test case:

PreparedStatement stmt = sqlMapper.getCurrentConnection().prepareStatement("SELECT itemID ,sellingprice ,margin ,popularityrating,name,att1,att2,att3,att4,att5,longtailpos,timest FROM itemdetails WHERE profileID= 10 ORDER BY longtailpos");

        stmt.setFetchSize(1000);
        stmt.execute();
        ResultSet set = stmt.executeQuery();

        int i = 0;


        while( set.next())
        {
            i++;
            System.out.println(set.getString(1));
        }


It can iterates through approx 1'500'000 rows without any memory issues (java process stays at approx 20mo while the loop executes). I can also see that it takes no more than a few seconds before the first row is printed in the output console, meaning that results are still being fetched from the DB at the moment the resultset is accessed.

Now if I comment the line "stmt.setFetchSize(1000);" in the above code, execution stops at "ResultSet set = stmt.executeQuery();" for approx 15 seconds and then an OutOfMemoryException is thrown from JDBC code. Also in cayenne code, that would be before the resultset is even accessed (making ResultIerator useless).

A similar problem is exposed here, with a sample stacktrace :
http://www.nabble.com/java.sql.Statement-generates-java.lang.OutOfMemoryError-in-big-tabe-td21568763.html

And the reason is detailed here :
http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor

Perhaps other drivers that automatically use server side cursors aren't affected or just ignore this setting but I can say for sure that this setting is necessary with postGres.

Andrey, you say you keep getting OutOfMemory exceptions no matter which value you set. Do you get these exceptions while iterating through the results or during the call to executeQuery? If you get it while looping through the resultset, it could mean that somehow the rows objects aren't garbage collected, it can happen with cayenne version 2.0 due to the dataContext holding strong references to the dataObjects. I didn't want to write a fix for v3 at first, but I realized that in v2, I had to manually unregister the dataObjects from the context in order to keep the memory usage in safe areas.

If I can be of any help, please ask.
Best regards...

Stéphane


2009/5/25 Andrus Adamchik <[email protected]>
Somehow I am not surprised. I may also try that on Oracle when the code becomes available on trunk.

(BTW not sure that Stephane is subscribed to the dev list, so cc'ying this message.)

Andrus

On May 24, 2009, at 12:52 PM, Andrey Razumovsky wrote:
I should say my tests on Postgres and mysql didn't show any results. If I use small heap size, i get OutOfMemory no matter which fetch size was set. Fetch speed and memory usage are the same. Seems drivers just ignore this
parameter. Stephane, did your workaround help you?

Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe other
DBMS drivers work well with this param. I'm going to commit today or
tomorrow. Nothing will break anything if we add this ability.

Andrey

2009/5/21 Andrus Adamchik <[email protected]>

Cool :-)

On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote:

Now when I know of this JDBC feature, I'd prefer to have it in Cayenne
sooner that later. We would also want to have it for other queries than
just
only SelectQuery.


Absolutely. This has to be a part of the QueryMetadata on the backend. On
the frontend any query that can potentially select data should have a
corresponding setter.

Could someone help me testing it against different types
of database if I commit?


I can test it on almost all DB's that we support. Of course we should have Cayenne unit tests that will provide regression (i.e. driver XYZ doesn't
throw UnsupportedOperationException when we call a corresponding JDBC
method).

In addition to that I'd like to see if there's really memory/speed savings when using that (i.e. is it really worth it). For that I suggest writing a JDBC test outside of Cayenne, that can be run in profiler against different
DB's.

Andrus




Reply via email to