My comments are mostly advisory for optimizers in general ;)

On 2/11/12 3:11 PM, Stefan Manegold wrote:
> On Sat, Feb 11, 2012 at 02:06:17PM +0100, Martin Kersten wrote:
>>
>>
>> On 2/11/12 11:03 AM, Stefan Manegold wrote:
>>> On Wed, Feb 08, 2012 at 10:27:11AM +0100, Martin Kersten wrote:
>>>> Changeset: 67c12a700166 for MonetDB
>>>> URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=67c12a700166
>>>> Modified Files:
>>>>    monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
>>>> Branch: default
>>>> Log Message:
>>>>
>>>> More advice on the optimizer template.
>>>>
>>>>
>>>> diffs (140 lines):
>>>>
>>>> diff --git a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx 
>>>> b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
>>>> --- a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
>>>> +++ b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
>>> [...]
>>>> @@ -39,6 +39,8 @@ All Rights Reserved.
>>>>    * i.e., an sql.append() statement that is eventually followed by some 
>>>> other
>>>>    * statement later on in the MAL program that uses the same v0 BAT as
>>>>    * argument as the sql.append() statement does,
>>>> + * Do you assume a single re-use of the variable v0?
>>>
>>> No. Why?
>> Use assign-once and use-many-times policy. It can improve parallel
>> processing
>> and simplifies scope analysis.
>
> v0 is (as far as I know) created (assigned) once (by Niels, or preceeding
> optimizers).
true, on purpose
> If it is used only once (only by sql_append), my optimizer does not (have
> to) do anything.  Otherwise, it replaces one use v0 (by sql_append) by a
> view of v0.
> That's the very purpose of this optimizer.
>
>>>> + * Do you assume a non-nested  MAL block ?
>>>
>>> Not necessarily.
>>>
>> Analysis may become complex if you have something like
>>
>> V0:= expr
>> barrier E1:=expr
>>      V0:= expr2
>> exit E1
>> now V0 depends on runtime use
>>
>>
>> same holds for
>> barrier E1:= expr
>>      V0:=expr
>> exit E1
>>      z:= f(V0)
>>
>> will be flagged as an error because V0 may be uninitialized
>>
>>> I must admit, that I do not know how the oprimizer framework handles nested
>>> MAL blocks, and what an optimizer needs to do to be aware of nested MAL
>>> blocks and to handle them correctly.
>> Preferrably the MAL blocks are linear programs (until you reach the
>> dataflow optimizer).
>
> How do I know / see that in my optimizer?
While looping through the plan you check if p->barrier is set.
You can always safely exit an optimizer.
> Do I have to check for barrier / exit statements / constructs myself?
in principle, yes
Optimizers in the pipeline preceeding yours could introduce them.
>
>>>
>>> In the sample optimizer, for now, I'd be fine if there are no
>>> false-positives, i.e., the optimizer triggers in case it should not trigger
>>> or in cases it cannot handle correctly.
>>> I can accept false-negatives, i.e., not triggering in all case it could 
>>> handle
>>> correctly.
>>>
>>>>    *
>>>>    * and transform them into
>>>>    *
>>>> @@ -52,6 +54,7 @@ All Rights Reserved.
>>>>    *
>>>>    * i.e., handing a BAT view v2 of BAT v0 as argument to the sql.append()
>>>>    * statement, rather than the original BAT v0.
>>>> + * My advice, always use new variable names, it may capture some easy to 
>>>> make errors.
>>>
>>> I/my optimizer does use new variables for all new statements/results.
>>> I/my optimizer re-use variable names only for identical results.
>>>
>>>>    *
>>>>    * As a refinement, patterns like
>>>>    *
>>> [...]
>>>> @@ -181,13 +195,17 @@ OPTsql_appendImplementation(Client cntxt
>>>>                                    pushInstruction(mb, q);
>>>>                                    q1 = q;
>>>>                                    i++;
>>>> -                                  actions++;
>>>> +                                  actions++;      /* to keep track if 
>>>> anything has been done */
>>>>                            }
>>>>                    }
>>>>
>>>> -                  /* look for
>>>> +                  /* look for     
>>>>                     *  v5 := ... v0 ...;
>>>>                     */
>>>> +                  /* an expensive loop, better would be to remember that 
>>>> v0 has a different role.
>>>> +                   * A typical method is to keep a map from variable ->   
>>>> instruction where it was
>>>> +                   * detected. The you can check each assignment for use 
>>>> of v0
>>>> +                  */
>>>
>>> This is general support functionality.
>>> Is this already available in the optimizer framework?
>> I try to use single pass algorithms in the optimizers.
>> Even in the case of commonterms optimizer, we may have to
>> traverse the history. This can become a n^2 process
>>
>>> If so, where is it and how can I use it?
>> Mimic how it is done in other optimizers (e.g. opt_reorder).
>> Typically, a buffer is maintained per variable to keep
>> optimization properties around.
>>
>>> If not, where/how could we add it?
>>>
>>>>                    for (j = i+1; !found&&   j<   limit; j++)
>>>>                            for (k = old[j]->retc; !found&&   k<   
>>>> old[j]->argc; k++)
>>>>                                    found = (getArg(old[j], k) == getArg(p, 
>>>> 5));
>>>> @@ -202,6 +220,8 @@ OPTsql_appendImplementation(Client cntxt
>>>>
>>>>                            /* push new v1 := aggr.count( v0 ); unless 
>>>> already available */
>>>>                            if (q1 == NULL) {
>>>> +                          /* use mal_buil.mx primitives q1 = newStmt(mb, 
>>>> aggrRef,countRef); setArgType(mb,q1,TYPE_wrd) */
>>>> +                          /* it will be added to the block and even my 
>>>> re-use MAL instructions */
>>>
>>> Is this (supposed to be) documentation of the existing code below,
>>> or rather advice how to implement the below functionality differently?
>> Use the mal_builder to simplify your code base.
>>
>>>
>>>>                                    q1 = newInstruction(mb,ASSIGNsymbol);
>>>>                                    getArg(q1,0) = newTmpVariable(mb, 
>>>> TYPE_wrd);
>>>>                                    setModuleId(q1, aggrRef);
>>>> @@ -211,6 +231,7 @@ OPTsql_appendImplementation(Client cntxt
>>>>                            }
>>>>
>>>>                            /* push new v2 := algebra.slice( v0, 0, v1 ); */
>>>> +                          /* use mal_buil.mx primitives q1 = newStmt(mb, 
>>>> algebraRef,sliceRef); */
>>>
>>> Is this (supposed to be) documentation of the existing code below,
>>> or rather advice how to implement the below functionality differently?
>>>
>>>>                            q2 = newInstruction(mb,ASSIGNsymbol);
>>>>                            getArg(q2,0) = newTmpVariable(mb, TYPE_any);
>>>>                            setModuleId(q2, algebraRef);
>>>> @@ -240,6 +261,7 @@ OPTsql_appendImplementation(Client cntxt
>>>>    for(i++; i<limit; i++)
>>>>            if (old[i])
>>>>                    pushInstruction(mb, old[i]);
>>>> +  /* any remaining MAL instruction records are removed */
>>>>    for(; i<slimit; i++)
>>>>            if (old[i])
>>>>                    freeInstruction(old[i]);
>>>> @@ -253,6 +275,9 @@ OPTsql_appendImplementation(Client cntxt
>>>>    return actions;
>>>>   }
>>>>
>>>> +/* optimizers have to be registered in the optcatalog in opt_support.c.
>>>
>>> Why?
>> SQL needs a place to pick up all optimizers known. You may also have
>> to extend the optimizer pipeline validity code.
>>
>>> If at all possible, I'd prefer to be able to add a new optimizer without the
>>> need to change existing code ...
>> yes understood, but you have to patch Makefile.ag, youroptimizer.mx, and
>> opt_support. Possibly, you may have to extend opt_prelude as well
>>
>>>
>>>> + * you have to path the file accordingly.
>> "path"
>>>                    ^^^^
>>> parse?
>>>
>>> What does this mean? What am I supposed to do in detail?
>>>
>>>> + */
>>>>   @include ../../optimizer/optimizerWrapper.mx
>>>>   @c
>>>>   #include "opt_statistics.h"
>>>> _______________________________________________
>>>> Checkin-list mailing list
>>>> checkin-l...@monetdb.org
>>>> http://mail.monetdb.org/mailman/listinfo/checkin-list
>>>>
>>>
>>> Thanks!
>>>
>>> Stefan
>>>
>> _______________________________________________
>> Checkin-list mailing list
>> checkin-l...@monetdb.org
>> http://mail.monetdb.org/mailman/listinfo/checkin-list
>>
>>
>

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Monetdb-developers mailing list
Monetdb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Reply via email to