Re: Reading csv files from S3?

Dallan Tue, 20 Mar 2012 08:44:17 -0700

Thanks for the reply. It appears that when the CSVREAD keyword is
processed by org.h2.expression.Function, that a new org.h2.tools.Csv
object is created, which implements the SimpleRowSource interface, and
the Csv object is read to generate an org.h2.tools.SimpleResultSet
object.  So it seems (to my newbie understanding) that if I extend the
Csv class to read from S3, that H2 will stream the CSV file during
query processing instead of trying to read the entire result set in
memory up front.  Is that correct?


Dallan

On Mar 20, 7:58 am, Noel Grandin <[email protected]> wrote:
> SimpleRowSource is already a "streaming interface"
> The trick would be to teach the higher-layer machinery to make use of it
> where possible.
>
> On 2012-03-20 00:27, Dallan wrote:
>
>
>
>
>
>
>
> > I'd like to extend the H2 source to read csv files from S3.
> > Specifically, I'd like to implement a streaming interface (as
> > SimpleRowSource?) so that a query could process the csv file one row
> > at a time without needing to read the entire file into memory.
>
> > Ideally, I'd like to have something like
>
> > select small.X, sum(big.Y)
> > from CSVREAD('s3:/bucket/path/filename.csv') as big, small
> > where big.foreign-key = small.key
> > group by small.X
>
> > where csv file would be streamed in so it wouldn't have to fit
> > entirely in memory, joined with the small table already resident in
> > memory, and the results aggregated and returned.
>
> > Is this feasible?  If so, could someone recommend where I should make
> > the changes?
>
> > Thanks,
>
> > Dallan

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: Reading csv files from S3?

Reply via email to