Arthur Zwiegincew
Mon, 29 Sep 2008 19:05:23 -0700
Oh, I didn't realize cat was a Pig command... The result is still the same.
Is this a regression? If so, what's the right build to use?
Thanks,
Arthur
On Mon, Sep 29, 2008 at 6:59 PM, Mridul Muralidharan
<[EMAIL PROTECTED]>wrote:
>
> I suggested store instead of dump to see if the problem is related to dump
> only or whether it is a general issue.
>
> cat in pig works the same whether it is a file or a directory (appropriate
> files in dir ofcourse).
>
> Though looking at your ls output, I suspect the map output did not produce
> the required result ...
>
> - Mridul
>
>
> Arthur Zwiegincew wrote:
>
>> I created two scripts: the first one the same as before, but using STORE
>> instead of DUMP, and the second one, which loads the stored file and dumps
>> it. No difference—I just get {(4, 20), (5, 20)}.
>>
>> Also, your suggestion of using cat indicates that you might be thinking
>> about local mode (where union works). As I see it, PigStorage() in Hadoop
>> mode ends up creating a directory, not just a single file:
>>
>> $ dir union.out
>> total 8
>> -rw-r--r-- 1 arthur staff 10B Sep 29 18:40 map-map
>> -rw-r--r-- 1 arthur staff 0B Sep 29 18:40 part-00000
>>
>> -Arthur
>>
>> On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan
>> <[EMAIL PROTECTED]>wrote:
>>
>> Hi Arthur,
>>>
>>> Does a store instead of dump work ?
>>> Something like :
>>>
>>> -- start --
>>> data = load '/Users/arthur/tmp/data' as (x, y);
>>> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
>>> both = union data, data2;
>>> store both into 'temp' using PigStorage();
>>>
>>> cat temp
>>> -- end --
>>>
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>> Arthur Zwiegincew wrote:
>>>
>>> I've come across a very basic problem—unions simply do not work in
>>>> Hadoop
>>>> mode.
>>>>
>>>> data files:
>>>>
>>>> $ cat ~/tmp/data
>>>> 1 1
>>>> 2 1
>>>> 3 10
>>>>
>>>> $ cat ~/tmp/data-2
>>>> 4 20
>>>> 5 20
>>>>
>>>> pig script:
>>>> data = load '/Users/arthur/tmp/data' as (x, y);
>>>> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
>>>> both = union data, data2;
>>>> dump both;
>>>>
>>>> result:
>>>> (4, 20)
>>>> (5, 20)
>>>>
>>>>
>>>> I've opened a bug <https://issues.apache.org/jira/browse/PIG-390> on
>>>> this,
>>>> but there has been no response.
>>>>
>>>>
>>>> Am I missing anything?
>>>>
>>>>
>>>> Thanks,
>>>> Arthur
>>>>
>>>>
>>>>
>>
>