[ https://issues.apache.org/jira/browse/SYSTEMML-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979623#comment-15979623 ]
Matthias Boehm commented on SYSTEMML-1554: ------------------------------------------ thanks [~mwdus...@us.ibm.com] for bringing this up - I wanted to do this for a long time (SYSTEMML-427) but until now it never got high priority. > IPA Scalar Transient Read Replacement > ------------------------------------- > > Key: SYSTEMML-1554 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1554 > Project: SystemML > Issue Type: Improvement > Reporter: Mike Dusenberry > Attachments: convnet_distrib_sgd.dml, parfor_oom_convnet_plan.txt, > parfor_oom_convnet.py, parfor_oom_plan.txt, parfor_oom.py > > > Currently, during IPA we collect all variables (scalars & matrices) eligible > for propagation across blocks (i.e. not updated in block), and then propagate > the only the matrix sizes across the blocks. It seems plausible that we > could also replace all eligible scalar transient reads with literals based on > the variables that have already been collected. The benefit is that many ops > will be able to determine their respective output sizes during regular > compilation, instead of having to wait until dynamic recompilation, and thus > we can reduce the pressure on dynamic recompilation. > Are there drawbacks to this approach? The use case is that I was seeing a > large number of memory warnings while training a convolutional net due to the > sizes being unknown during regular compilation, yet the engine only having CP > versions of the ops. Additionally, I was running into actual heap space OOM > errors for situations that should not run out of memory, and thus I started > exploring. > I've attached an example script and the explain plan (hops & runtime) w/ and > w/o the IPA scalar replacement. -- This message was sent by Atlassian JIRA (v6.3.15#6346)