Imran Younus created SYSTEMML-1043:

             Summary: NMF implementation taking too long
                 Key: SYSTEMML-1043
             Project: SystemML
          Issue Type: Bug
          Components: APIs, PyDML
         Environment: standalone mode on labtop, and yarn cluster with 10 nodes
            Reporter: Imran Younus

I'm testing the following NMF algorithm written using python API:
from pyspark.sql import SQLContext
import systemml as sml
from systemml import random

sqlContext = SQLContext(sc)

m, n = tfidf.shape
k = 40
V = sml.matrix(tfidf)
W = sml.random.uniform(size=(m, k))
H = sml.random.uniform(size=(k, n))

max_iters = 200
for i in range(max_iters):
    H = H * (W.transpose().dot(V))/(W.transpose().dot(
    W = W * (

W = W.toNumPyArray()

Here {{tfidf}} is a sparse matrix of shape (114720, 11590)

The evaluation of {{W}} takes more than one hour when running on laptop. On 
yarn cluster, it didn't finish in 1.5 hours (I killed the job).

If I evaluate {{H}} matrix instead, it just takes 2 min.

Note that even if I call {{eval}} before evaluating {{W}}, it doesn't make any 
difference. {{W}} still takes an hour.

This message was sent by Atlassian JIRA

Reply via email to