Mike Dusenberry created SYSTEMML-952:
----------------------------------------

             Summary: Efficient Counts During Conversions
                 Key: SYSTEMML-952
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-952
             Project: SystemML
          Issue Type: Improvement
            Reporter: Mike Dusenberry


Currently, we spend a lot of time on {{count}} during the conversions from wide 
DataFrames. When calling {{count}} in Spark on these DataFrames directly, it is 
much quicker to just select one of the simple double columns (say the id 
column) and then {{count}}, in that it it does not read in the heavy vector 
column as well.

Therefore, we should perform the row count only on the index column, and the 
column count on the first row.

cc [~mboehm7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to