[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

Ramakrishna (Jira) Tue, 08 Nov 2022 21:44:03 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ramakrishna updated SPARK-41070:
--------------------------------
    Description: 
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

 

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 

  was:
We are connecting Tera data from spark SQL with below API

Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
connectionProperties);

 

We are facing one issue when we execute this logic on large table with million 
rows every time we are seeing below extra query is executing every times as 
this resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 


> Performance issue when Spark SQL connects with TeraData 
> --------------------------------------------------------
>
>                 Key: SPARK-41070
>                 URL: https://issues.apache.org/jira/browse/SPARK-41070
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.4
>            Reporter: Ramakrishna
>            Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
>  
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this query is executing from our code it self while check for rows count 
> from  dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

Reply via email to