[
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607523#comment-17607523
]
CaoYu edited comment on SPARK-40502 at 9/21/22 6:07 AM:
--------------------------------------------------------
I am a teacher
Recently designed Python language basic course, big data direction
PySpark is one of the practical cases, but it is only a simple use of RDD code
to complete the basic data processing work, and the use of JDBC data source is
a part of the course
Because the course is very basic, simple rdd code is suitable as an example.
But if you use DataFrame, you need to explain more content, which is not
friendly to novice students
DataFrames(SparkSQL) will be used in future design advanced courses.
So I hope that the extraction of jdbc data may be completed through the api of
rdd
was (Author: javacaoyu):
I am a teacher
Recently designed Python language basic course, big data direction
PySpark is one of the practical cases, but it is only a simple use of RDD code
to complete the basic data processing work, and the use of JDBC data source is
a part of the course
DataFrames(SparkSQL) will be used in future design advanced courses.
So I hope the datastream API to have the capability of jdbc datasource.
> Support dataframe API use jdbc data source in PySpark
> -----------------------------------------------------
>
> Key: SPARK-40502
> URL: https://issues.apache.org/jira/browse/SPARK-40502
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: CaoYu
> Priority: Major
>
> When i using pyspark, i wanna get data from mysql database. so i want use
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>
> For some reasons, i can't using DataFrame API, only can use RDD(datastream)
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>
> So i want to implement functionality that can use rdd to get data from jdbc
> source for PySpark.
>
> *But i don't know if that are necessary for PySpark. so we can discuss it.*
>
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}
> *i hope this Jira task can assigned to me, so i can start working to
> implement it.*
>
> *if not, please close this Jira task.*
>
>
> *thanks a lot.*
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]