[h2] Re: Porting application to Spark

areichel Wed, 05 Feb 2020 06:38:13 -0800

Good evening. While I do not do scala/spark, but only Groovy I ran into 
similar challenges before and want to share my experience:


1) I am not sure if your construct is thread safe
2) in my understanding you open many connections and objects in parallel
3) I would like to advise to use one single PREPARED statement instead and 
to submit the parameters as batch




On Thursday, January 23, 2020 at 2:42:51 AM UTC+7, H2inSpark wrote:
>
> I am in the middle of porting a legacy application that uses H2 into 
> Spark.  The legacy application populated H2 via JOOQ using DSLContext ie 
> dslContext.execute("CREATE TABLE blah...."), and it populated the actual 
> tables using the CSVREAD function ie dslContext.execute("INSERT INTO blah 
> ... FROM CSVREAD ...").  In the new application I can't use the CSVREAD 
> function to populate tables because now I'm reading Datasets off HDFS.  So 
> now I'm trying to populate the tables using JDBC and batch statements like 
> so(scala pseudocode):
>
> // I am creating a new DB instance per key, this is desired
> Class.forName("org.h2.Driver")
> val conn = DriverManager.getConnection("jdbc:h2:mem:"+key)
>
>
> // Then I'm populating the db like so, BATCH_SIZE is 1000 currently
> rows.grouped(BATCH_SIZE).foreach(batch => {
>       var stmt = conn.createStatement()
>       batch.foreach(row => {
>         stmt.addBatch(s"""
>                      INSERT INTO blah (col1,col2,col3,col4,...)""" +
>           s""" 
> VALUES('${row.val1}',${row.val2},${row.val3},'${row.val4}',...})""".stripMargin)
>       })
>       stmt.executeBatch()
>       conn.commit()
>       stmt.close()
>     })
>
>
> At runtime in my spark application I am getting OutOfMemory errors around 
> populating the tables with JDBC.  The spark executors have ample memory to 
> handle the dataset I'm working with, is there something I'm doing wrong 
> with my JDBC commands?  Also is there a way to use the CSVREAD function via 
> an InputStream? Because I don't have the ability to read the file from HDFS 
> and then write it back again locally somewhere to be used by CSVREAD.
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/67cf7456-5763-498b-b264-7861b63a6d35%40googlegroups.com.

[h2] Re: Porting application to Spark

Reply via email to