Skewed join sampler generates unevenly partitioned data
-------------------------------------------------------

                 Key: PIG-1743
                 URL: https://issues.apache.org/jira/browse/PIG-1743
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.7.0, 0.8.0
            Reporter: Viraj Bhat


I have a data, when using the Skewed join generated uneven partitions. The 
script looks like this:

{script}
Data1 = LOAD '/user/viraj/relation1.in' AS (ref,intVal);
Data2 = LOAD '/user/viraj/relation2.in' using PigStorage('\u0001') AS 
(ID:chararray, Key:chararray, DomainKey:chararray);
JoinData = JOIN Data1 BY ref LEFT OUTER , Data2 BY ID using 'skewed' PARALLEL 
10;
STORE JoinData into 'skewedoutput' using PigStorage('\u0001');
{script}

The output generated has the following part files of varying sizes

{quote}
$ hadoop fs -ls /user/viraj/skewedoutput
Found 10 items
-rw-------   3 viraj users       2090 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00000
-rw-------   3 viraj users      19380 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00001
-rw-------   3 viraj users       2090 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00002
-rw-------   3 viraj users       9690 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00003
-rw-------   3 viraj users       2090 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00004
-rw-------   3 viraj users       2090 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00005
-rw-------   3 viraj users          0 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00006
-rw-------   3 viraj users          0 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00007
-rw-------   3 viraj users          0 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00008
-rw-------   3 viraj users          0 2010-11-23 03:44 
/user/viraj/skewedoutput/part-r-00009
{quote}

Attaching input datasets.

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to