[jira] [Commented] (SPARK-27077) DataFrameReader and Number of Connection Limitation

Yuming Wang (JIRA) Wed, 06 Mar 2019 15:31:12 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786228#comment-16786228
 ]


Yuming Wang commented on SPARK-27077:
-------------------------------------

Could you try to set {{numPartitions}} please?
The maximum number of partitions that can be used for parallelism in table 
reading and writing. This also determines the maximum number of concurrent JDBC 
connections. If the number of partitions to write exceeds this limit, we 
decrease it to this limit by calling coalesce(numPartitions) before writing.

http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

> DataFrameReader and Number of Connection Limitation
> ---------------------------------------------------
>
>                 Key: SPARK-27077
>                 URL: https://issues.apache.org/jira/browse/SPARK-27077
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.2
>            Reporter: Paul Wu
>            Priority: Major
>
> I am not very sure this is a Spark core issue or a Vertica issue, however I 
> intended to think this is Spark's issue.  The problem is that when we try to 
> read with sparkSession.read.load from some datasource, in my case, Vertica 
> DB, the DataFrameReader needs to make some 'large' number of  initial  jdbc 
> connection requests. My account limits I can only use 16 (and I can see at 
> least 6 of them can be used for my loading), and when the "large" number of 
> the requests issued, I got exception below.  In fact, I can see eventually it 
> could settle with fewer numbers of connections (in my case 2 simultaneous 
> DataFrameReader). So I think we should have a parameter that prevents the 
> reader to send out initial "bigger" number of connection requests than user's 
> limit. If we don't have this option parameter, my app could fail randomly due 
> to my Vertica account's number of connections allowed.
>  
> java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: 
> New session rejected because connection limit of 16 on database already met 
> for M21176
>         at com.vertica.util.ServerErrorData.buildException(Unknown Source)
>         at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
>         at com.vertica.io.ProtocolStream.initSession(Unknown Source)
>         at com.vertica.core.VConnection.tryConnect(Unknown Source)
>         at com.vertica.core.VConnection.connect(Unknown Source)
>         at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
> Source)
>         at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
>         at java.sql.DriverManager.getConnection(DriverManager.java:664)
>         at java.sql.DriverManager.getConnection(DriverManager.java:208)
>         at 
> com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
>         at 
> com.vertica.spark.datasource.VerticaRelation.<init>(VerticaRelation.scala:34)
>         at 
> com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
>         at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
>         at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
>         at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
>         at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
>         at 
> com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
> Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
> [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit 
> of 16 on databas                e already met for
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-27077) DataFrameReader and Number of Connection Limitation

Reply via email to