Xusen Yin created SPARK-1585:
--------------------------------
Summary: Not robust Lasso causes Infinity on weights and losses
Key: SPARK-1585
URL: https://issues.apache.org/jira/browse/SPARK-1585
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 0.9.1
Reporter: Xusen Yin
Assignee: Xusen Yin
Fix For: 1.0.0
Lasso uses LeastSquaresGradient and L1Updater, but
diff = brzWeights.dot(brzData) - label
in LeastSquaresGradient would cause too big diff, then will affect the
L1Updater, which increases weights exponentially. Small shrinkage value cannot
lasso weights back to zero then. Finally, the weights and losses reach Infinity.
For example, data = (0.5 repeats 10k times), weights = (0.6 repeats 10k times),
then data.dot(weights) approximates 300+, the diff will be 300. Then L1Updater
sets weights to approximate 300. In the next iteration, the weights will be set
to approximate 30000, and so on.
--
This message was sent by Atlassian JIRA
(v6.2#6252)