[ 
https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-30641:
---------------------------------
    Description: 
We had been refactoring linear models for a long time, and there still are some 
works in the future:
 # *Blockification (vectorization of vectors)*
 ** vectors are stacked into matrices, so that high-level BLAS can be used for 
better performance. (about ~3x faster on sparse datasets, up to ~15x faster on 
dense datasets, see). Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and 
we need to blockify KMeans in the future.
 # *Standardization (virutal centering)*

 ** *Existing impl of standardization in linear models does* *NOT* *center the 
vectors by removing the means, for the purpose of keep the dataset sparsity. 
However, this will cause feature values with small var be scaled to large 
values, and underlying solver like LBFGS can not efficiently handle this case. 
see SPARK-34448 for details.*
 # **

 

  was:
We had been refactoring linear models for a long time, and there still are some 
works in the future:
 # *Blockification (vectorization of vectors)*
 ** vectors are stacked into matrices, so that high-level BLAS can be used for 
better performance. (about ~3x faster on sparse datasets, up to ~15x faster on 
dense datasets). Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we 
need to blockify KMeans in the future.
 # *Standardization* 

 


> Project Matrix: Linear Models revisit and refactor
> --------------------------------------------------
>
>                 Key: SPARK-30641
>                 URL: https://issues.apache.org/jira/browse/SPARK-30641
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, PySpark
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: zhengruifeng
>            Assignee: zhengruifeng
>            Priority: Major
>
> We had been refactoring linear models for a long time, and there still are 
> some works in the future:
>  # *Blockification (vectorization of vectors)*
>  ** vectors are stacked into matrices, so that high-level BLAS can be used 
> for better performance. (about ~3x faster on sparse datasets, up to ~15x 
> faster on dense datasets, see). Since 3.1.1, LoR/SVC/LiR/AFT supports 
> blockification, and we need to blockify KMeans in the future.
>  # *Standardization (virutal centering)*
>  ** *Existing impl of standardization in linear models does* *NOT* *center 
> the vectors by removing the means, for the purpose of keep the dataset 
> sparsity. However, this will cause feature values with small var be scaled to 
> large values, and underlying solver like LBFGS can not efficiently handle 
> this case. see SPARK-34448 for details.*
>  # **
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to