[jira] [Resolved] (SPARK-7594) Increase maximum amount of columns for covariance matrix for principal components

Sean Owen (JIRA) Wed, 13 May 2015 01:17:45 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-7594.
------------------------------
    Resolution: Invalid

Please ask questions at user@
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

I think the issues is that the resulting Gramian somewhere will then have more 
than 2^32 entries in an internal array. At this scale you'd be passing around 
arrays of tens of gigabytes, which probably is well beyond what's practical for 
this implementation. 

> Increase maximum amount of columns for covariance matrix for principal 
> components
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-7594
>                 URL: https://issues.apache.org/jira/browse/SPARK-7594
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Sebastian Alfers
>            Priority: Minor
>
> In order to compute a huge dataset, the amount of columns to calculate the 
> covariance matrix is limited:
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L129
> What is the reason behind this limitation and can it be extended?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-7594) Increase maximum amount of columns for covariance matrix for principal components

Reply via email to