pig-user  

Re: Union support

Mridul Muralidharan
Mon, 29 Sep 2008 19:00:30 -0700


I suggested store instead of dump to see if the problem is related to dump only or whether it is a general issue.

cat in pig works the same whether it is a file or a directory (appropriate files in dir ofcourse).

Though looking at your ls output, I suspect the map output did not produce the required result ...

- Mridul

Arthur Zwiegincew wrote:
I created two scripts: the first one the same as before, but using STORE
instead of DUMP, and the second one, which loads the stored file and dumps
it. No difference—I just get {(4, 20), (5, 20)}.

Also, your suggestion of using cat indicates that you might be thinking
about local mode (where union works). As I see it, PigStorage() in Hadoop
mode ends up creating a directory, not just a single file:

$ dir union.out
total 8
-rw-r--r--  1 arthur  staff    10B Sep 29 18:40 map-map
-rw-r--r--  1 arthur  staff     0B Sep 29 18:40 part-00000

-Arthur

On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan
<[EMAIL PROTECTED]>wrote:

Hi Arthur,

 Does a store instead of dump work ?
Something like :

-- start --
data = load '/Users/arthur/tmp/data' as (x, y);
data2 = load '/Users/arthur/tmp/data-2' as (x, y);
both = union data, data2;
store both into 'temp' using PigStorage();

cat temp
-- end --



Regards,
Mridul

Arthur Zwiegincew wrote:

I've come across a very basic problem—unions simply do not work in Hadoop
mode.

data files:

$ cat ~/tmp/data
1 1
2 1
3 10

$ cat ~/tmp/data-2
4 20
5 20

pig script:
data = load '/Users/arthur/tmp/data' as (x, y);
data2 = load '/Users/arthur/tmp/data-2' as (x, y);
both = union data, data2;
dump both;

result:
(4, 20)
(5, 20)


I've opened a bug <https://issues.apache.org/jira/browse/PIG-390> on
this,
but there has been no response.


Am I missing anything?


Thanks,
Arthur