[
https://issues.apache.org/jira/browse/SPARK-19443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liang-Chi Hsieh updated SPARK-19443:
------------------------------------
Description:
This issue is originally reported and discussed at
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-ML-Pipeline-performance-regression-between-1-6-and-2-x-tc20803.html
When run a ML `Pipeline` with many stages, during the iterative updating to
`Dataset` , it is observed the it takes longer time to finish the fit and
transform as the query plan grows continuously.
Specially, the time spent on preparing optimized plan in current branch (74294
ms) is much higher than 1.6 (292 ms). Actually, the time is spent mostly on
generating query plan's constraints during few optimization rules.
> The function to generate constraints takes too long when the query plan grows
> continuously
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-19443
> URL: https://issues.apache.org/jira/browse/SPARK-19443
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Liang-Chi Hsieh
>
> This issue is originally reported and discussed at
> http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-ML-Pipeline-performance-regression-between-1-6-and-2-x-tc20803.html
> When run a ML `Pipeline` with many stages, during the iterative updating to
> `Dataset` , it is observed the it takes longer time to finish the fit and
> transform as the query plan grows continuously.
> Specially, the time spent on preparing optimized plan in current branch
> (74294 ms) is much higher than 1.6 (292 ms). Actually, the time is spent
> mostly on generating query plan's constraints during few optimization rules.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]