Mridul Muralidharan
Mon, 29 Sep 2008 19:00:30 -0700
I suggested store instead of dump to see if the problem is related to dump only or whether it is a general issue.
cat in pig works the same whether it is a file or a directory (appropriate files in dir ofcourse).
Though looking at your ls output, I suspect the map output did not produce the required result ...
- Mridul Arthur Zwiegincew wrote:
I created two scripts: the first one the same as before, but using STORE instead of DUMP, and the second one, which loads the stored file and dumps it. No difference—I just get {(4, 20), (5, 20)}. Also, your suggestion of using cat indicates that you might be thinking about local mode (where union works). As I see it, PigStorage() in Hadoop mode ends up creating a directory, not just a single file: $ dir union.out total 8 -rw-r--r-- 1 arthur staff 10B Sep 29 18:40 map-map -rw-r--r-- 1 arthur staff 0B Sep 29 18:40 part-00000 -Arthur On Mon, Sep 29, 2008 at 6:36 PM, Mridul Muralidharan <[EMAIL PROTECTED]>wrote:Hi Arthur, Does a store instead of dump work ? Something like : -- start -- data = load '/Users/arthur/tmp/data' as (x, y); data2 = load '/Users/arthur/tmp/data-2' as (x, y); both = union data, data2; store both into 'temp' using PigStorage(); cat temp -- end -- Regards, Mridul Arthur Zwiegincew wrote:I've come across a very basic problem—unions simply do not work in Hadoop mode. data files: $ cat ~/tmp/data 1 1 2 1 3 10 $ cat ~/tmp/data-2 4 20 5 20 pig script: data = load '/Users/arthur/tmp/data' as (x, y); data2 = load '/Users/arthur/tmp/data-2' as (x, y); both = union data, data2; dump both; result: (4, 20) (5, 20) I've opened a bug <https://issues.apache.org/jira/browse/PIG-390> on this, but there has been no response. Am I missing anything? Thanks, Arthur