[ https://issues.apache.org/jira/browse/SYSTEMML-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485273#comment-15485273 ]
Matthias Boehm commented on SYSTEMML-909: ----------------------------------------- Which datatypes do you have in your example dataframe? If it's not double then I would suspect the implicit double parsing being the reason. In any case, this narrow transformation will not be a bottleneck compared to creating the binary block representation which requires a shuffle. > `determineDataFrameDimensionsIfNeeded(...)` is a bottleneck. > ------------------------------------------------------------ > > Key: SYSTEMML-909 > URL: https://issues.apache.org/jira/browse/SYSTEMML-909 > Project: SystemML > Issue Type: Improvement > Reporter: Mike Dusenberry > > The {{[determineDataFrameDimensionsIfNeeded(...) | > https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java#L585]}} > function in {{MLContext}} is a major bottleneck, particularly due to the > `javaRDD` call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)