[ 
https://issues.apache.org/jira/browse/SYSTEMML-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485273#comment-15485273
 ] 

Matthias Boehm commented on SYSTEMML-909:
-----------------------------------------

Which datatypes do you have in your example dataframe? If it's not double then 
I would suspect the implicit double parsing being the reason. In any case, this 
narrow transformation will not be a bottleneck compared to creating the binary 
block representation which requires a shuffle.

> `determineDataFrameDimensionsIfNeeded(...)` is a bottleneck.
> ------------------------------------------------------------
>
>                 Key: SYSTEMML-909
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-909
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Mike Dusenberry
>
> The {{[determineDataFrameDimensionsIfNeeded(...) | 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java#L585]}}
>  function in {{MLContext}} is a major bottleneck, particularly due to the 
> `javaRDD` call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to