Mike Dusenberry created SYSTEMML-952:
----------------------------------------
Summary: Efficient Counts During Conversions
Key: SYSTEMML-952
URL: https://issues.apache.org/jira/browse/SYSTEMML-952
Project: SystemML
Issue Type: Improvement
Reporter: Mike Dusenberry
Currently, we spend a lot of time on {{count}} during the conversions from wide
DataFrames. When calling {{count}} in Spark on these DataFrames directly, it is
much quicker to just select one of the simple double columns (say the id
column) and then {{count}}, in that it it does not read in the heavy vector
column as well.
Therefore, we should perform the row count only on the index column, and the
column count on the first row.
cc [~mboehm7]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)