Olga Natkovich
Wed, 17 Sep 2008 13:00:46 -0700
Hi, If I ran the query below (and this is based on actual user query): -- Note that data1 has more than 1 column but as only declares a single one A = load 'data1' as (x); B = load 'data2' as (x, y, z); C = JOIN A by x, B by x; D = foreach C generate y,z; store D into 'output'; the current pig implementation produces wrong results. The reason is that currently load assumes that complete schema is given to it. The intention of the user was that (s)he only cares about the first column as the rest of the data could be thrown out. So in fact, "as" is treated as project. Do Pig users/developers have a strong opinion on how Pig should handle this case? If so, please, provide use cases. Thanks, Olga