pig-user  

Question about semantics of "as" on the load statement

Olga Natkovich
Wed, 17 Sep 2008 13:00:46 -0700

Hi,
 
If I ran the query below (and this is based on actual user query):
 
-- Note that data1 has more than 1 column but as only declares a single
one
A = load 'data1' as (x);
B = load 'data2' as (x, y, z);
C = JOIN A by x, B by x;
D = foreach C generate y,z;
store D into 'output';
 
the current pig implementation produces wrong results. The reason is
that currently load assumes that complete schema is  given to it. The
intention of the user was that (s)he only cares about the first column
as the rest of the data could be thrown out. So in fact, "as" is treated
as project.
 
Do Pig users/developers have a strong opinion on how Pig should handle
this case? If so, please, provide use cases.
 
Thanks,
 
Olga