Imran Younus created SYSTEMML-1043:
--------------------------------------

             Summary: NMF implementation taking too long
                 Key: SYSTEMML-1043
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1043
             Project: SystemML
          Issue Type: Bug
          Components: APIs, PyDML
         Environment: standalone mode on labtop, and yarn cluster with 10 nodes
            Reporter: Imran Younus


I'm testing the following NMF algorithm written using python API:
{code}
from pyspark.sql import SQLContext
import systemml as sml
from systemml import random

sqlContext = SQLContext(sc)
sml.setSparkContext(sc)

m, n = tfidf.shape
k = 40
V = sml.matrix(tfidf)
W = sml.random.uniform(size=(m, k))
H = sml.random.uniform(size=(k, n))

max_iters = 200
for i in range(max_iters):
    H = H * (W.transpose().dot(V))/(W.transpose().dot(W.dot(H)))
    W = W * (V.dot(H.transpose()))/(W.dot(H.dot(H.transpose())))

W = W.toNumPyArray()
{code}

Here {{tfidf}} is a sparse matrix of shape (114720, 11590)

The evaluation of {{W}} takes more than one hour when running on laptop. On 
yarn cluster, it didn't finish in 1.5 hours (I killed the job).

If I evaluate {{H}} matrix instead, it just takes 2 min.

Note that even if I call {{eval}} before evaluating {{W}}, it doesn't make any 
difference. {{W}} still takes an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to