[
https://issues.apache.org/jira/browse/SYSTEMML-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257925#comment-16257925
]
Matthias Boehm edited comment on SYSTEMML-2021 at 11/18/17 5:18 AM:
--------------------------------------------------------------------
With the fix for robustess as described above, the test passed successfully
with the following enumeration statistics:
{code}
Codegen enum (ALLt/p,EVALt/p): 17867063951501/1099512676970/832230/832172.
{code}
where we now enumerate 832K plans out of the one trillion plans (with
partitioning) and 17 trillion plans (without partitioning).
was (Author: mboehm7):
With the fix for robustess as described above, the test passed successfully
with the following enumeration statistics:
{code}
Codegen enum (ALLt/p,EVALt/p): 17867063951501/1099512676970/832230/832172.
{code}
> Perftest stratstats w/ codegen getting stuck in optimization
> ------------------------------------------------------------
>
> Key: SYSTEMML-2021
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2021
> Project: SystemML
> Issue Type: Bug
> Reporter: Matthias Boehm
>
> On stratstats 10K, the codegen optimizer gets stuck during plan enumeration
> of a DAG partition with 274,877,906,944 plans. During initial compilation,
> the same partition was processed in 2.8s as most candidate plans have been
> successfully pruned by cost or structure.
> After a detailed analysis, we can characterize the problem as follows: there
> is a fusion partition with 38 interesting points, resulting in the huge
> search space of 274 billion plans. Furthermore, all these materialization
> points are extremely small while at the same time there are large inputs and
> compute-intensive operations as well as many common subexpression, which
> renders the pruning by costs starting from the fuse-all heuristic
> ineffective.
> We can address this as follows: First, we should run both heuristics
> (fuse-all and fuse-no-redundancy) which are the first and last plan upfront
> to quickly obtain a good lower bound for the costs. Second, we should only
> consider enumerating large partitions (say with more than 20 interesting
> points) if the opening heuristics show costs that is further than an epsilon
> (say 1%) away from the minimal static costs. Together these rules will make
> the optimizer significantly more robust without missing any meaningful plan
> choices.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)