[jira] [Comment Edited] (SPARK-40502) Support dataframe API use jdbc data source in PySpark

CaoYu (Jira) Tue, 20 Sep 2022 23:08:08 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607523#comment-17607523
 ]


CaoYu edited comment on SPARK-40502 at 9/21/22 6:07 AM:
--------------------------------------------------------

I am a teacher
Recently designed Python language basic course, big data direction

PySpark is one of the practical cases, but it is only a simple use of RDD code 
to complete the basic data processing work, and the use of JDBC data source is 
a part of the course

 

Because the course is very basic, simple rdd code is suitable as an example.
But if you use DataFrame, you need to explain more content, which is not 
friendly to novice students

DataFrames(SparkSQL) will be used in future design advanced courses.

So I hope that the extraction of jdbc data may be completed through the api of 
rdd

 

 

 


was (Author: javacaoyu):
I am a teacher
Recently designed Python language basic course, big data direction

PySpark is one of the practical cases, but it is only a simple use of RDD code 
to complete the basic data processing work, and the use of JDBC data source is 
a part of the course

DataFrames(SparkSQL) will be used in future design advanced courses.
So I hope the datastream API to have the capability of jdbc datasource.

 

 

> Support dataframe API use jdbc data source in PySpark
> -----------------------------------------------------
>
>                 Key: SPARK-40502
>                 URL: https://issues.apache.org/jira/browse/SPARK-40502
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: CaoYu
>            Priority: Major
>
> When i using pyspark, i wanna get data from mysql database.  so i want use 
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>  
> For some reasons, i can't using DataFrame API, only can use RDD(datastream) 
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>  
> So i want to implement functionality that can use rdd to get data from jdbc 
> source for PySpark.
>  
> *But i don't know if that are necessary for PySpark.   so we can discuss it.*
>  
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}  
> *i hope this Jira task can assigned to me, so i can start working to 
> implement it.*
>  
> *if not, please close this Jira task.*
>  
>  
> *thanks a lot.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-40502) Support dataframe API use jdbc data source in PySpark

Reply via email to