Arthur Zwiegincew
Mon, 29 Sep 2008 18:48:07 -0700
I created two scripts: the first one the same as before, but using STORE
instead of DUMP, and the second one, which loads the stored file and dumps
it. No difference—I just get {(4, 20), (5, 20)}.
Also, your suggestion of using cat indicates that you might be thinking
about local mode (where union works). As I see it, PigStorage() in Hadoop
mode ends up creating a directory, not just a single file:
$ dir union.out
total 8
-rw-r--r-- 1 arthur staff 10B Sep 29 18:40 map-map
-rw-r--r-- 1 arthur staff 0B Sep 29 18:40 part-00000
-Arthur
On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan
<[EMAIL PROTECTED]>wrote:
>
> Hi Arthur,
>
> Does a store instead of dump work ?
> Something like :
>
> -- start --
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> store both into 'temp' using PigStorage();
>
> cat temp
> -- end --
>
>
>
> Regards,
> Mridul
>
> Arthur Zwiegincew wrote:
>
>> I've come across a very basic problem—unions simply do not work in Hadoop
>> mode.
>>
>> data files:
>>
>> $ cat ~/tmp/data
>> 1 1
>> 2 1
>> 3 10
>>
>> $ cat ~/tmp/data-2
>> 4 20
>> 5 20
>>
>> pig script:
>> data = load '/Users/arthur/tmp/data' as (x, y);
>> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
>> both = union data, data2;
>> dump both;
>>
>> result:
>> (4, 20)
>> (5, 20)
>>
>>
>> I've opened a bug <https://issues.apache.org/jira/browse/PIG-390> on
>> this,
>> but there has been no response.
>>
>>
>> Am I missing anything?
>>
>>
>> Thanks,
>> Arthur
>>
>>
>