[ 
https://issues.apache.org/jira/browse/SYSTEMML-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539267#comment-16539267
 ] 

Niketan Pansare commented on SYSTEMML-880:
------------------------------------------

Pushdown of loop avoids invocation overhead and also enables additional 
optimization. Here is a simple pyspark script that demonstrates the overhead:

 
{code:java}
from systemml import MLContext, dml
import numpy as np
import time
numpyX = np.ones((10000,100))
ml = MLContext(sc)

# Execute with pushdown of loop
script_with_loop = dml('s = 0; for(i in 1:1000) { s = s + sum(X); } ')
t0 = time.time()
ml.execute(script_with_loop.input(X=numpyX).output('s')).get('s')
print('Total time with loop:' +  str(time.time()-t0))
# Total time with loop:2.50334095955

# Execute without pushdown of loop
pythonS = 0
totalTime = 0
script_without_loop = dml('s = s + sum(X)').input(X=numpyX).output('s')
for i in range(1000):
    t0 = time.time()
    pythonS = ml.execute(script_without_loop.input(s=pythonS)).get('s')
    totalTime = totalTime + time.time()-t0

print('Total time without loop:' +  str(totalTime))
# Total time without loop:1008.73590732
{code}
 

One way to go about doing this is to define the boundaries using a decorator 
(for example: parallelize) and try by first supporting simple expression and a 
loop structure.

 

> Push-down loop structures in Python DSL
> ---------------------------------------
>
>                 Key: SYSTEMML-880
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-880
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Niketan Pansare
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to