On Oct 18, 2010, at 12:28 PM, Scott Carey wrote:
It is the last that looks to complicate things a bit. We should
probably pass the fields visible to the macro as well as the aliases.
That might look like:
------------------------
define disjunctive_join_filter[out RESULT, A.(a,b,c), B.(a,b,c)]
(filterStr) {
inline join_filter[MATCH_1, A.(a,c), B.(a,c)]($filterStr);
inline join_filter[MATCH_2, A.(b,c), B.(b,c)]($filterStr);
RESULT_GROUP = COGROUP MATCH_1 BY (a,b,c), MATCH_2 BY (a,b,c); //
not sure if it can avoid alias ambiguity here, would hate to have to
project for it
RESULT = FOREACH RESULT_GROUP {
CHOSEN = (IsEmpty(MATCH_1) ? MATCH_2 : MATCH_1);
GENERATE FLATTEN(CHOSEN);
}
}
I am concerned that including the fields in each alias that are
accessed is fairly complex, especially for a first version of this
feature. This same functionality can be accomplished by passing these
values as parameters in the general list. That is, your example can
be rewritten as:
define disjunctive_join_filter[out RESULT, A, B](filterStr, A_a, A_b,
A_c, B_a, B_b, B_c) {
This will require better documentation from macro writers to clearly
declare what each parameter is for, but it will work. And nothing
prevents us from extending to the syntax you suggest in the future if
it becomes clear that it will be useful.
I would also like to get people's feedback on the syntax. I'm not
wild about brackets meaning aliases and parenthesis meaning
parameters. Would explicit lists be better?
define macro in A, B out Y, Z (param1, param2) { ... }
Alan.