[
https://issues.apache.org/jira/browse/SYSTEMML-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539267#comment-16539267
]
Niketan Pansare edited comment on SYSTEMML-880 at 7/10/18 9:54 PM:
-------------------------------------------------------------------
Pushdown of loop avoids invocation overhead and also enables additional
optimization. Here is a simple pyspark script that demonstrates the overhead:
{code:java}
from systemml import MLContext, dml
import numpy as np
import time
numpyX = np.ones((10000,100))
ml = MLContext(sc)
# Execute with pushdown of loop
script_with_loop = dml('s = 0; for(i in 1:1000) { s = s + sum(X); } ')
t0 = time.time()
ml.execute(script_with_loop.input(X=numpyX).output('s')).get('s')
print('Total time with loop:' + str(time.time()-t0))
# Total time with loop:2.50334095955
# Execute without pushdown of loop
pythonS = 0
totalTime = 0
script_without_loop = dml('s = s + sum(X)').input(X=numpyX).output('s')
for i in range(1000):
t0 = time.time()
pythonS = ml.execute(script_without_loop.input(s=pythonS)).get('s')
totalTime = totalTime + time.time()-t0
print('Total time without loop:' + str(totalTime))
# Total time without loop:1008.73590732
{code}
One way to go about doing this is to define the boundaries using a decorator
(for example: parallelize) and try by first supporting simple expression and a
loop structure.
Few related links:
[https://greentreesnakes.readthedocs.io/en/latest/nodes.html#control-flow]
[https://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts]
was (Author: niketanpansare):
Pushdown of loop avoids invocation overhead and also enables additional
optimization. Here is a simple pyspark script that demonstrates the overhead:
{code:java}
from systemml import MLContext, dml
import numpy as np
import time
numpyX = np.ones((10000,100))
ml = MLContext(sc)
# Execute with pushdown of loop
script_with_loop = dml('s = 0; for(i in 1:1000) { s = s + sum(X); } ')
t0 = time.time()
ml.execute(script_with_loop.input(X=numpyX).output('s')).get('s')
print('Total time with loop:' + str(time.time()-t0))
# Total time with loop:2.50334095955
# Execute without pushdown of loop
pythonS = 0
totalTime = 0
script_without_loop = dml('s = s + sum(X)').input(X=numpyX).output('s')
for i in range(1000):
t0 = time.time()
pythonS = ml.execute(script_without_loop.input(s=pythonS)).get('s')
totalTime = totalTime + time.time()-t0
print('Total time without loop:' + str(totalTime))
# Total time without loop:1008.73590732
{code}
One way to go about doing this is to define the boundaries using a decorator
(for example: parallelize) and try by first supporting simple expression and a
loop structure.
> Push-down loop structures in Python DSL
> ---------------------------------------
>
> Key: SYSTEMML-880
> URL: https://issues.apache.org/jira/browse/SYSTEMML-880
> Project: SystemML
> Issue Type: Task
> Reporter: Niketan Pansare
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)