While thinking and talking through
https://issues.apache.org/jira/browse/PIG-2536, something came up:
should this idea, that relation.projection works in distincts, work in any
case where a projection is present?
In the patch I linked to, it allows you to do
b = distinct a.$0;
It accomplishes this by mapping that to:
b = distinct (foreach a generate $0);
It seems that if this is useful, then this should be available wherever
relations are used?
ie
b = group a.(x,y) by x;
or anything. The case of group is somewhat problematic, however, because if
you describe that, you'll get...
b: {group: int,1-2: {(x: int,y: int)}}
Which, per Alan's comment, has to do with no real naming convention for
nested relations....
I guess the question is whether this is, in general, useful?
More broadly...
- Is it worth thinking about how to make this go deeper? Currently you can
do b = distinct a.x, but not b = distinct a.x.$0 (if it were appropriate).
There are issues with this (and in fact there is an outstanding but w.r.t.
b = foreach (group a by $0) generate $1.$0.$0.$0.$0; <== this works!).
- Is the strategy of the syntactic sugar ok? I think in this case it should
be (the relation name issue notwithstanding), but could see arguments
either way.
Find a super small patch with no tests attached... I wanted to get some
thoughts before making yet another JIRA?