Thanks for the reply. It appears that when the CSVREAD keyword is processed by org.h2.expression.Function, that a new org.h2.tools.Csv object is created, which implements the SimpleRowSource interface, and the Csv object is read to generate an org.h2.tools.SimpleResultSet object. So it seems (to my newbie understanding) that if I extend the Csv class to read from S3, that H2 will stream the CSV file during query processing instead of trying to read the entire result set in memory up front. Is that correct?
Dallan On Mar 20, 7:58 am, Noel Grandin <[email protected]> wrote: > SimpleRowSource is already a "streaming interface" > The trick would be to teach the higher-layer machinery to make use of it > where possible. > > On 2012-03-20 00:27, Dallan wrote: > > > > > > > > > I'd like to extend the H2 source to read csv files from S3. > > Specifically, I'd like to implement a streaming interface (as > > SimpleRowSource?) so that a query could process the csv file one row > > at a time without needing to read the entire file into memory. > > > Ideally, I'd like to have something like > > > select small.X, sum(big.Y) > > from CSVREAD('s3:/bucket/path/filename.csv') as big, small > > where big.foreign-key = small.key > > group by small.X > > > where csv file would be streamed in so it wouldn't have to fit > > entirely in memory, joined with the small table already resident in > > memory, and the results aggregated and returned. > > > Is this feasible? If so, could someone recommend where I should make > > the changes? > > > Thanks, > > > Dallan -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
