Gary King created SPARK-13132:
---------------------------------
Summary: LogisticRegression spends 35% of its time fetching the
standardization parameter
Key: SPARK-13132
URL: https://issues.apache.org/jira/browse/SPARK-13132
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 1.6.0
Reporter: Gary King
when L1 regularization is used, the inner functor passed to the quasi-newton
optimizer in {{org.apache.spark.ml.classification.LogisticRegression#train}}
makes repeated calls to {{$(standardization)}}. because this ultimately
involves repeated string interpolation triggered by
{{org.apache.spark.ml.param.Param#hashCode}}, this line of code consumes
35%-45% of the entire training time in my application.
the range depends on whether the application sets an explicit value for the
standardization parameter or relies on the default value (which needs an extra
map lookup, resulting in an extra string interpolation, compared to the
explicitly set case)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]