I'm using latest postgres (or mysql) driver. To make a test, I put 1M records in db table, then do "select * from that_table". No matter which fetch size I tried to set, at the moment of PreparedStastement.execute() something about 70-100Mb are allocated (I don't actually remember the quantity right now). So, if heap size is small, I get OutOfMemory. Otherwise, it's all ok and rows can be quickly iterated. Looking at code on postgres site, the only difference is that I didn't bother with autocommit mode. I'm not sure Cayenne does, too... Maybe this is the reason why I didn't get positive results.
Andrey 2009/5/25 Andrus Adamchik <[email protected]> > [Forwarding to the dev list. Stéphane, please forward your replies here as > well or subscribe to dev if you feel like it] > > Stéphane, thanks for providing this information. It is encouraging. > > The mismatch in the results between you and Andrey is exactly why I wanted > to do real profiling, as we need to see how the memory allocation changes in > the process. OutOfMemory may or may not happen depending on many factors. > What we care about here is the dynamics of ResultSet reading. > > I see that Andrey has already committed this code to trunk, so now we have > something to play with. If you don't mind using a trunk build (doing it > yourself or grabbing one from Hudson), you can give us the feedback on the > implementation. > > Andrus > > > Begin forwarded message: > >> From: Stéphane Claret <[email protected]> >> Date: May 25, 2009 11:58:52 AM GMT+03:00 >> To: Andrus Adamchik <[email protected]> >> Subject: Re: Big selects on PostGres : Configuring >> Statement.setFetchSize() of a selectquery >> >> Hello, >> >> To answer, setting fetchSize with postgres does make a difference. >> If you consider the following code copy-pasted from my test case: >> >> PreparedStatement stmt = >> sqlMapper.getCurrentConnection().prepareStatement("SELECT >> itemID,sellingprice,margin,popularityrating,name,att1,att2,att3,att4,att5,longtailpos,timest >> FROM itemdetails WHERE profileID= 10 ORDER BY longtailpos"); >> >> stmt.setFetchSize(1000); >> stmt.execute(); >> ResultSet set = stmt.executeQuery(); >> >> int i = 0; >> >> >> while( set.next()) >> { >> i++; >> System.out.println(set.getString(1)); >> } >> >> >> It can iterates through approx 1'500'000 rows without any memory issues >> (java process stays at approx 20mo while the loop executes). I can also see >> that it takes no more than a few seconds before the first row is printed in >> the output console, meaning that results are still being fetched from the DB >> at the moment the resultset is accessed. >> >> Now if I comment the line "stmt.setFetchSize(1000);" in the above code, >> execution stops at "ResultSet set = stmt.executeQuery();" for approx 15 >> seconds and then an OutOfMemoryException is thrown from JDBC code. Also in >> cayenne code, that would be before the resultset is even accessed (making >> ResultIerator useless). >> >> A similar problem is exposed here, with a sample stacktrace : >> >> http://www.nabble.com/java.sql.Statement-generates-java.lang.OutOfMemoryError-in-big-tabe-td21568763.html >> >> And the reason is detailed here : >> http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor >> >> Perhaps other drivers that automatically use server side cursors aren't >> affected or just ignore this setting but I can say for sure that this >> setting is necessary with postGres. >> >> Andrey, you say you keep getting OutOfMemory exceptions no matter which >> value you set. Do you get these exceptions while iterating through the >> results or during the call to executeQuery? >> If you get it while looping through the resultset, it could mean that >> somehow the rows objects aren't garbage collected, it can happen with >> cayenne version 2.0 due to the dataContext holding strong references to the >> dataObjects. I didn't want to write a fix for v3 at first, but I realized >> that in v2, I had to manually unregister the dataObjects from the context in >> order to keep the memory usage in safe areas. >> >> If I can be of any help, please ask. >> Best regards... >> >> Stéphane >> >> >> 2009/5/25 Andrus Adamchik <[email protected]> >> >> Somehow I am not surprised. I may also try that on Oracle when the code >> becomes available on trunk. >> >> (BTW not sure that Stephane is subscribed to the dev list, so cc'ying this >> message.) >> >> Andrus >> >> On May 24, 2009, at 12:52 PM, Andrey Razumovsky wrote: >> I should say my tests on Postgres and mysql didn't show any results. If I >> use small heap size, i get OutOfMemory no matter which fetch size was set. >> Fetch speed and memory usage are the same. Seems drivers just ignore this >> parameter. Stephane, did your workaround help you? >> >> Drivers of hsql, postres, mysql didn't throw any exceptions.. Maybe other >> DBMS drivers work well with this param. I'm going to commit today or >> tomorrow. Nothing will break anything if we add this ability. >> >> Andrey >> >> 2009/5/21 Andrus Adamchik <[email protected]> >> >> Cool :-) >> >> On May 21, 2009, at 3:59 PM, Andrey Razumovsky wrote: >> >> Now when I know of this JDBC feature, I'd prefer to have it in Cayenne >> sooner that later. We would also want to have it for other queries than >> just >> only SelectQuery. >> >> >> Absolutely. This has to be a part of the QueryMetadata on the backend. On >> the frontend any query that can potentially select data should have a >> corresponding setter. >> >> Could someone help me testing it against different types >> of database if I commit? >> >> >> I can test it on almost all DB's that we support. Of course we should have >> Cayenne unit tests that will provide regression (i.e. driver XYZ doesn't >> throw UnsupportedOperationException when we call a corresponding JDBC >> method). >> >> In addition to that I'd like to see if there's really memory/speed savings >> when using that (i.e. is it really worth it). For that I suggest writing a >> JDBC test outside of Cayenne, that can be run in profiler against >> different >> DB's. >> >> Andrus >> >> >> >> >
