[
https://issues.apache.org/jira/browse/SPARK-49184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruifeng Zheng updated SPARK-49184:
----------------------------------
Description:
1, Some plots depends on MLlib and don't work with Spark Conenct. We need to
reimplement ML-based plots with Spark SQL, so they can be compatible with Spark
Connect;
2, Further computation optimization, e.g.:
* compute all necessary metrics for some plots in single pass of the whole
dataset, so we can improve the performance;
* optimize existing sampling algorithm
was:
1, Some plots depends on MLlib and don't work with Spark Conenct. We need to
reimplement ML-based plots with Spark SQL, so they can be compatible with Spark
Connect;
2, Further computation optimization, e.g.:
* we can compute all necessary metrics for some plots in single pass of the
whole dataset, so we can improve the performance;
* we can optimize existing sampling algorithm
> Refactor plotting implementations
> ---------------------------------
>
> Key: SPARK-49184
> URL: https://issues.apache.org/jira/browse/SPARK-49184
> Project: Spark
> Issue Type: Umbrella
> Components: PySpark
> Affects Versions: 4.0.0
> Reporter: Ruifeng Zheng
> Assignee: Xinrong Meng
> Priority: Major
>
> 1, Some plots depends on MLlib and don't work with Spark Conenct. We need to
> reimplement ML-based plots with Spark SQL, so they can be compatible with
> Spark Connect;
> 2, Further computation optimization, e.g.:
> * compute all necessary metrics for some plots in single pass of the whole
> dataset, so we can improve the performance;
> * optimize existing sampling algorithm
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]