[
https://issues.apache.org/jira/browse/SYSTEMML-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524805#comment-15524805
]
Matthias Boehm commented on SYSTEMML-831:
-----------------------------------------
well, couple of comments: your job is most likely failing due to the 2GB
limitation of Spark partitions
(https://issues.apache.org/jira/browse/SPARK-6235). This usually happens if
small inputs create large outputs (with constant number of input/output
partitions) and normally we try very hard to avoid this characteristics anyway.
If I remember correctly this algorithm had a problematic O(n^2) space
requirement in the number of rows n, which might explain this. I would
recommend to avoid forcing -exec spark but use the default of -exec
hybrid_spark. Anyway, I will look into this today.
> Implement t-SNE algorithm
> -------------------------
>
> Key: SYSTEMML-831
> URL: https://issues.apache.org/jira/browse/SYSTEMML-831
> Project: SystemML
> Issue Type: Improvement
> Components: Algorithms
> Reporter: Imran Younus
> Assignee: Imran Younus
> Attachments: out_2016_09_26_10.log
>
>
> This jira implements the t-distributed Stochastic Neighbor Embedding
> algorithm for dimensionality reduction presented in this paper:
> Visualizing Data using t-SNE
> by Laurens van der Maaten, Geoffrey Hinton
> http://www.jmlr.org/papers/v9/vandermaaten08a.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)