[
https://issues.apache.org/jira/browse/SPARK-28980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-28980.
-------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
Issue resolved by pull request 25684
[https://github.com/apache/spark/pull/25684]
> Remove most remaining deprecated items since <= 2.2.0 for 3.0
> -------------------------------------------------------------
>
> Key: SPARK-28980
> URL: https://issues.apache.org/jira/browse/SPARK-28980
> Project: Spark
> Issue Type: Task
> Components: MLlib, PySpark, Spark Core, SQL, Structured Streaming,
> YARN
> Affects Versions: 3.0.0
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Major
> Labels: release-notes
> Fix For: 3.0.0
>
>
> Following on https://issues.apache.org/jira/browse/SPARK-25908 I'd like to
> propose removing the rest of the items that have been deprecated since <=
> Spark 2.2.0, before Spark 3.0.
> This appears to be:
> - Remove SQLContext.createExternalTable and Catalog.createExternalTable,
> deprecated in favor of createTable since 2.2.0, plus tests of deprecated
> methods
> - Remove HiveContext, deprecated in 2.0.0, in favor of
> SparkSession.builder.enableHiveSupport
> - Remove deprecated KinesisUtils.createStream methods, plus tests of
> deprecated methods, deprecate in 2.2.0
> - Remove deprecated MLlib (not Spark ML) linear method support, mostly
> utility constructors and 'train' methods, and associated docs. This includes
> methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression.
> These have been deprecated since 2.0.0
> - Remove deprecated Pyspark MLlib linear method support, including
> LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD
> - Remove 'runs' argument in KMeans.train() method, which has been a no-op
> since 2.0.0
> - Remove deprecated ChiSqSelector isSorted protected method
> - Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor
> of 'yarn' and deploy mode 'cluster', etc
> But while preparing the change, I found:
> - I was not able to remove deprecated DataFrameReader.json(RDD) in favor of
> DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is
> still needed to support Pyspark's .json() method, which can't use a Dataset.
> - Looks like SQLContext.createExternalTable was not actually deprecated in
> Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable
> was.
> - I afterwards noted that the toDegrees, toRadians functions were almost
> removed fully in SPARK-25908, but Felix suggested keeping just the R version
> as they hadn't been technically deprecated. I'd like to revisit that. Do we
> really want the inconsistency? I'm not against reverting it again, but then
> that implies leaving SQLContext.createExternalTable just in Pyspark too,
> which seems weird.
> - I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD,
> RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove
> them (still used by StreamingLogisticRegressionWithSGD?) and they are not
> fully removed in Scala. Maybe should not have been deprecated.
> I will open a PR accordingly for more detailed review.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]