[ 
https://issues.apache.org/jira/browse/SPARK-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422630#comment-15422630
 ] 

Nick Pentreath commented on SPARK-17001:
----------------------------------------

This approach seems fine - I tend to agree with allowing users to configure 
certain options even if they are potentially dangerous, under the assumption 
that they should know the implications (with appropriate documentation and 
warnings).

However, an alternative (or perhaps additional) solution, is that 
{{VectorAssembler}} should allow an option to force dense (or sparse) vectors 
as output. This would allow the case where a user knows they want to scale the 
data even if it breaks sparsity, because the vectors are not that big.

Thoughts?

> Enable standardScaler to standardize sparse vectors when withMean=True
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17001
>                 URL: https://issues.apache.org/jira/browse/SPARK-17001
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Tobi Bosede
>            Priority: Minor
>
> When withMean = true, StandardScaler will not handle sparse vectors, and 
> instead throw an exception. This is presumably because subtracting the mean 
> makes a sparse vector dense, and this can be undesirable. 
> However, VectorAssembler generates vectors that may be a mix of sparse and 
> dense, even when vectors are smallish, depending on their values. It's common 
> to feed this into StandardScaler, but it would fail sometimes depending on 
> the input if withMean = true. This is kind of surprising.
> StandardScaler should go ahead and operate on sparse vectors and subtract the 
> mean, if explicitly asked to do so with withMean, on the theory that the user 
> knows what he/she is doing, and there is otherwise no way to make this work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to