Good evening. While I do not do scala/spark, but only Groovy I ran into
similar challenges before and want to share my experience:
1) I am not sure if your construct is thread safe
2) in my understanding you open many connections and objects in parallel
3) I would like to advise to use one single PREPARED statement instead and
to submit the parameters as batch
On Thursday, January 23, 2020 at 2:42:51 AM UTC+7, H2inSpark wrote:
>
> I am in the middle of porting a legacy application that uses H2 into
> Spark. The legacy application populated H2 via JOOQ using DSLContext ie
> dslContext.execute("CREATE TABLE blah...."), and it populated the actual
> tables using the CSVREAD function ie dslContext.execute("INSERT INTO blah
> ... FROM CSVREAD ..."). In the new application I can't use the CSVREAD
> function to populate tables because now I'm reading Datasets off HDFS. So
> now I'm trying to populate the tables using JDBC and batch statements like
> so(scala pseudocode):
>
> // I am creating a new DB instance per key, this is desired
> Class.forName("org.h2.Driver")
> val conn = DriverManager.getConnection("jdbc:h2:mem:"+key)
>
>
> // Then I'm populating the db like so, BATCH_SIZE is 1000 currently
> rows.grouped(BATCH_SIZE).foreach(batch => {
> var stmt = conn.createStatement()
> batch.foreach(row => {
> stmt.addBatch(s"""
> INSERT INTO blah (col1,col2,col3,col4,...)""" +
> s"""
> VALUES('${row.val1}',${row.val2},${row.val3},'${row.val4}',...})""".stripMargin)
> })
> stmt.executeBatch()
> conn.commit()
> stmt.close()
> })
>
>
> At runtime in my spark application I am getting OutOfMemory errors around
> populating the tables with JDBC. The spark executors have ample memory to
> handle the dataset I'm working with, is there something I'm doing wrong
> with my JDBC commands? Also is there a way to use the CSVREAD function via
> an InputStream? Because I don't have the ability to read the file from HDFS
> and then write it back again locally somewhere to be used by CSVREAD.
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/h2-database/67cf7456-5763-498b-b264-7861b63a6d35%40googlegroups.com.