[
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607523#comment-17607523
]
CaoYu commented on SPARK-40502:
-------------------------------
I am a teacher
Recently designed Python language basic course, big data direction
PySpark is one of the practical cases, but it is only a simple use of RDD code
to complete the basic data processing work, and the use of JDBC data source is
a part of the course
DataFrames(SparkSQL) will be used in future design advanced courses.
So I hope the datastream API to have the capability of jdbc datasource.
> Support dataframe API use jdbc data source in PySpark
> -----------------------------------------------------
>
> Key: SPARK-40502
> URL: https://issues.apache.org/jira/browse/SPARK-40502
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: CaoYu
> Priority: Major
>
> When i using pyspark, i wanna get data from mysql database. so i want use
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>
> For some reasons, i can't using DataFrame API, only can use RDD(datastream)
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>
> So i want to implement functionality that can use rdd to get data from jdbc
> source for PySpark.
>
> *But i don't know if that are necessary for PySpark. so we can discuss it.*
>
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}
> *i hope this Jira task can assigned to me, so i can start working to
> implement it.*
>
> *if not, please close this Jira task.*
>
>
> *thanks a lot.*
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]