Re: [h2] Re: Time To First Row on large ResultSet

val Tue, 10 Nov 2020 09:22:13 -0800


I haven't traced it through all the way.    
*org.h2.result.LocalResultImpl#addRowsToDisk 
*is a good starting point.


By the way I was using the word serialize 'loosely', its not using Java 
serialization to serialize the result set. From a quick code review, its 
placing the result sets into 

*org.h2.mvstore.db.MVPlainTempResult*
*org.h2.mvstore.db.MVSortedTempResult*

So its using these MVStore classes to temporarily persist the ResultSet to 
disk.

Based on information here http://www.h2database.com/html/mvstore.html it 
maybe possible to start returning rows from the MVStore while the full 
content is still being written. That's probably easier then replacing this 
whole implementation with a new mechanism like CSV or ASCII; as it would 
already contains the  ResultSet meta data like types, sizes, etc.. which 
would need a brand new custom implementation to be encoded in csv/ascii




On Tuesday, November 10, 2020 at 3:03:23 AM UTC-5 
[email protected] wrote:

> Noel and Val, good morning.
>
> Would you like to give us a hint, where exactly this 
> "temporally serialization of a resultset" happens in the code?
> We could try to modify it so it writes Flat ASCII files or CSV files 
> instead of binary blobs (on demand, per hint), which would enable us to 
> read line by line and access the first rows while the full content is still 
> written.
>
> Best regards
> Andreas
>
>
> On Tue, 2020-11-10 at 08:52 +0200, Noel Grandin wrote:
>
>  > On Tuesday, November 10, 2020 at 12:03:30 AM UTC-5 val wrote:
>  >     I'm guessing this is  how the h2 client/driver (not the db engine 
> it self) behaves to handle a ResultSet.  It will
>  >     fully serialize the contents of the ResultSet to disk first and 
> then serve that. Perhaps this is necessary to
>
> No, it's just the easiest way to handle multiple resultset performantly, 
> because then the engine doesn't have to 
> remember any state about the resultset.
>
> On 2020/11/10 7:31 am, val wrote:
>
> Adding *LAZY_QUERY_EXECUTION=1 * to the connection url is closer to what I 
> was looking for.
>
>
> Yeah, that is currently our best answer for extremely large result sets, 
> note that it only works for relatively simple 
> queries, and it will likely be quite bad performance-wise if you have more 
> than connection attempting to hit the same table.
>
> The alternative is to grab chunks of the resultset at a time, using 
> OFFSET...LIMIT....
>
>
> I wonder where the bottleneck is. What would it take to bring this 20 
> minutes total iteration time down to the 6 minutes 
> it takes if you prebuild the ResultSet first.
>
>
> You'd have to run a profiler, either our simple one (see our performance 
> page), or one of the other ones, to see where 
> the bottleneck is.
>
> TBH, H2 is mostly optimised for small quick queries, not really large ones.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/87f5020a-3ca4-46ad-b88d-ddc2a656d106n%40googlegroups.com.

Re: [h2] Re: Time To First Row on large ResultSet

Reply via email to