[
https://issues.apache.org/jira/browse/SPARK-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley updated SPARK-6683:
-------------------------------------
Description:
GeneralizedLinearAlgorithm can scale features. This has 2 effects:
* improves optimization behavior (essentially always improves behavior in
practice)
* changes the optimal solution (often for the better in terms of standardizing
feature importance)
Current problems:
* Inefficient implementation: We make a rescaled copy of the data.
* Surprising API: For algorithms which use feature scaling, users may get
different solutions than with R or other libraries. (Note: Feature scaling
could be handled without changing the solution.)
* Inconsistent API: Not all algorithms have the same default for feature
scaling, and not all expose the option.
This is a proposal discussed with [~mengxr] for an "ideal" solution. This
solution will require some breaking API changes, but I'll argue these are
necessary for the long-term.
Proposal:
* Implementation: Change to avoid making a rescaled copy of the data (described
below). No API issues here.
* API:
** Hide featureScaling from API. (breaking change)
** Internally, handle feature scaling to improve optimization, but modify it so
it does not change the optimal solution. (breaking change, in terms of
algorithm behavior)
** Externally, users who want to rescale feature (to change the solution)
should do that scaling as a preprocessing step.
Details on implementation:
* GradientDescent could instead scale the step size separately for each feature
(and adjust regularization as needed; see the PR linked above). This would
require storing a vector of length numFeatures, rather than making a full copy
of the data.
* I haven't thought this through for LBFGS, but I hope [~dbtsai] can weigh in
here.
was:
GeneralizedLinearAlgorithm can scale features. This improves optimization
behavior (and also affects the optimal solution, as is being discussed and
hopefully fixed by [https://github.com/apache/spark/pull/5055]).
This is a bit inefficient since it requires making a rescaled copy of the data.
GradientDescent could instead scale the step size separately for each feature
(and adjust regularization as needed; see the PR linked above). This would
require storing a vector of length numFeatures, rather than making a full copy
of the data.
I haven't thought this through for LBFGS, so I'm not sure if it's generally
usable or would require a specialization for GLMs with GradientDescent.
> Handling feature scaling properly for GLMs
> ------------------------------------------
>
> Key: SPARK-6683
> URL: https://issues.apache.org/jira/browse/SPARK-6683
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
>
> GeneralizedLinearAlgorithm can scale features. This has 2 effects:
> * improves optimization behavior (essentially always improves behavior in
> practice)
> * changes the optimal solution (often for the better in terms of
> standardizing feature importance)
> Current problems:
> * Inefficient implementation: We make a rescaled copy of the data.
> * Surprising API: For algorithms which use feature scaling, users may get
> different solutions than with R or other libraries. (Note: Feature scaling
> could be handled without changing the solution.)
> * Inconsistent API: Not all algorithms have the same default for feature
> scaling, and not all expose the option.
> This is a proposal discussed with [~mengxr] for an "ideal" solution. This
> solution will require some breaking API changes, but I'll argue these are
> necessary for the long-term.
> Proposal:
> * Implementation: Change to avoid making a rescaled copy of the data
> (described below). No API issues here.
> * API:
> ** Hide featureScaling from API. (breaking change)
> ** Internally, handle feature scaling to improve optimization, but modify it
> so it does not change the optimal solution. (breaking change, in terms of
> algorithm behavior)
> ** Externally, users who want to rescale feature (to change the solution)
> should do that scaling as a preprocessing step.
> Details on implementation:
> * GradientDescent could instead scale the step size separately for each
> feature (and adjust regularization as needed; see the PR linked above). This
> would require storing a vector of length numFeatures, rather than making a
> full copy of the data.
> * I haven't thought this through for LBFGS, but I hope [~dbtsai] can weigh in
> here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]