[ 
https://issues.apache.org/jira/browse/SPARK-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-6683:
-------------------------------------
    Description: 
GeneralizedLinearAlgorithm can scale features.  This has 2 effects:
* improves optimization behavior (essentially always improves behavior in 
practice)
* changes the optimal solution (often for the better in terms of standardizing 
feature importance)

Current problems:
* Inefficient implementation: We make a rescaled copy of the data.
* Surprising API: For algorithms which use feature scaling, users may get 
different solutions than with R or other libraries.  (Note: Feature scaling 
could be handled without changing the solution.)
* Inconsistent API: Not all algorithms have the same default for feature 
scaling, and not all expose the option.

This is a proposal discussed with [~mengxr] for an "ideal" solution.  This 
solution will require some breaking API changes, but I'll argue these are 
necessary for the long-term.

Proposal:
* Implementation: Change to avoid making a rescaled copy of the data (described 
below).  No API issues here.
* API:
** Hide featureScaling from API. (breaking change)
** Internally, handle feature scaling to improve optimization, but modify it so 
it does not change the optimal solution. (breaking change, in terms of 
algorithm behavior)
** Externally, users who want to rescale feature (to change the solution) 
should do that scaling as a preprocessing step.

Details on implementation:
* GradientDescent could instead scale the step size separately for each feature 
(and adjust regularization as needed; see the PR linked above).  This would 
require storing a vector of length numFeatures, rather than making a full copy 
of the data.
* I haven't thought this through for LBFGS, but I hope [~dbtsai] can weigh in 
here.


  was:
GeneralizedLinearAlgorithm can scale features.  This improves optimization 
behavior (and also affects the optimal solution, as is being discussed and 
hopefully fixed by [https://github.com/apache/spark/pull/5055]).

This is a bit inefficient since it requires making a rescaled copy of the data.

GradientDescent could instead scale the step size separately for each feature 
(and adjust regularization as needed; see the PR linked above).  This would 
require storing a vector of length numFeatures, rather than making a full copy 
of the data.

I haven't thought this through for LBFGS, so I'm not sure if it's generally 
usable or would require a specialization for GLMs with GradientDescent.


> Handling feature scaling properly for GLMs
> ------------------------------------------
>
>                 Key: SPARK-6683
>                 URL: https://issues.apache.org/jira/browse/SPARK-6683
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> GeneralizedLinearAlgorithm can scale features.  This has 2 effects:
> * improves optimization behavior (essentially always improves behavior in 
> practice)
> * changes the optimal solution (often for the better in terms of 
> standardizing feature importance)
> Current problems:
> * Inefficient implementation: We make a rescaled copy of the data.
> * Surprising API: For algorithms which use feature scaling, users may get 
> different solutions than with R or other libraries.  (Note: Feature scaling 
> could be handled without changing the solution.)
> * Inconsistent API: Not all algorithms have the same default for feature 
> scaling, and not all expose the option.
> This is a proposal discussed with [~mengxr] for an "ideal" solution.  This 
> solution will require some breaking API changes, but I'll argue these are 
> necessary for the long-term.
> Proposal:
> * Implementation: Change to avoid making a rescaled copy of the data 
> (described below).  No API issues here.
> * API:
> ** Hide featureScaling from API. (breaking change)
> ** Internally, handle feature scaling to improve optimization, but modify it 
> so it does not change the optimal solution. (breaking change, in terms of 
> algorithm behavior)
> ** Externally, users who want to rescale feature (to change the solution) 
> should do that scaling as a preprocessing step.
> Details on implementation:
> * GradientDescent could instead scale the step size separately for each 
> feature (and adjust regularization as needed; see the PR linked above).  This 
> would require storing a vector of length numFeatures, rather than making a 
> full copy of the data.
> * I haven't thought this through for LBFGS, but I hope [~dbtsai] can weigh in 
> here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to