[ https://issues.apache.org/jira/browse/SPARK-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567230#comment-14567230 ]
Rene Treffer commented on SPARK-8008: ------------------------------------- At the moment each partition uses it's own connection as far as I can tell, I have to double check how this works on a cluster where even multiple server might fetch data. I'm currently loading year+month wise, due to DB schema (index on actual days, locality based on year/month). I don't think larger batches would be an solution. 3 months may require >160Mio rows. I don't think batching that into one partition is a good idea. > sqlContext.jdbc can kill your database due to high concurrency > -------------------------------------------------------------- > > Key: SPARK-8008 > URL: https://issues.apache.org/jira/browse/SPARK-8008 > Project: Spark > Issue Type: Bug > Reporter: Rene Treffer > > Spark tries to load as many partitions as possible in parallel, which can in > turn overload the database although it would be possible to load all > partitions given a lower concurrency. > It would be nice to either limit the maximum concurrency or to at least warn > about this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org