[jira] [Commented] (SYSTEMML-831) Implement t-SNE algorithm

Matthias Boehm (JIRA) Mon, 26 Sep 2016 19:17:47 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524805#comment-15524805
 ]


Matthias Boehm commented on SYSTEMML-831:
-----------------------------------------

well, couple of comments: your job is most likely failing due to the 2GB 
limitation of Spark partitions 
(https://issues.apache.org/jira/browse/SPARK-6235). This usually happens if 
small inputs create large outputs (with constant number of input/output 
partitions) and normally we try very hard to avoid this characteristics anyway. 
If I remember correctly this algorithm had a problematic O(n^2) space 
requirement in the number of rows n, which might explain this. I would 
recommend to avoid forcing -exec spark but use the default of -exec 
hybrid_spark. Anyway, I will look into this today.

> Implement t-SNE algorithm
> -------------------------
>
>                 Key: SYSTEMML-831
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-831
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms
>            Reporter: Imran Younus
>            Assignee: Imran Younus
>         Attachments: out_2016_09_26_10.log
>
>
> This jira implements the t-distributed Stochastic Neighbor Embedding 
> algorithm for dimensionality reduction presented in this paper:
> Visualizing Data using t-SNE 
> by Laurens van der Maaten, Geoffrey Hinton
> http://www.jmlr.org/papers/v9/vandermaaten08a.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-831) Implement t-SNE algorithm

Reply via email to