> My question here: WeightedRangePartitioner only shows how key distribute
and makes every reduce receive equal data from map. But this can gurantee
sort?
  Yes. It is a range partitioner and the ranges are determined after
sampling and determining key distribution to avoid the skew. For eg: In
very simple terms, f you have alphabetical keys a-z, it will distribute a-c
to reducer 0, d-m to reducer 1, m-z to reducer 2 so that if you read the
part files in order they are sorted.

On Tue, Mar 31, 2015 at 1:48 AM, Zhang, Liyun <[email protected]> wrote:

> Hi all,
>   I want to ask a question about following script:
> testlimit.pig
>
>
> a = load './testlimit.txt' as (x:int, y:chararray);
>
>
> b = order a by x;
>
>
> c = limit b 1;
>
>
> store c into './testlimit.out';
>
>
>
>
> In MR:it will generate 4 MapReduce node(scope-11, scope-14,
> scope-29,scope-40)
>
> scope-11: load the input data and store it to a tmp file
> scope-14: sampleload the tmp file and generate the quantile file: hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425. I think the
> quantile file contains
> the instance of WeightedRangePartitioner which shows how keys distribute.
> scope-29: use the quantile file to sort. My question here:
> WeightedRangePartitioner only shows how key distribute and makes every
> reduce receive equal data from map. But this can gurantee sort?
>
>
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-11
> Map Plan
> Store(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage)
> - scope-12
> |
> |---a: New For Each(false,false)[bag] - scope-7
>      |   |
>      |   Cast[int] - scope-2
>      |   |
>      |   |---Project[bytearray][0] - scope-1
>      |   |
>      |   Cast[chararray] - scope-5
>      |   |
>      |   |---Project[bytearray][1] - scope-4
>      |
>      |---a: Load(hdfs://
> zly1.sh.intel.com:8020/user/root/testlimit.txt:org.apache.pig.builtin.PigStorage)
> - scope-0--------
> Global sort: false
> ----------------
>
> MapReduce node scope-14
> Map Plan
> b: Local Rearrange[tuple]{tuple}(false) - scope-18
> |   |
> |   Constant(all) - scope-17
> |
> |---New For Each(false)[tuple] - scope-16
>      |   |
>      |   Project[int][0] - scope-15
>      |
>      |---Load(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100'))
> - scope-13--------
> Reduce Plan
> Store(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425:org.apache.pig.impl.io.InterStorage)
> - scope-27
> |
> |---New For Each(false)[tuple] - scope-26
>      |   |
>      |   POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] -
> scope-25
>      |   |
>      |   |---Project[tuple][*] - scope-24
>      |
>      |---New For Each(false,false)[tuple] - scope-23
>          |   |
>          |   Constant(2) - scope-22
>          |   |
>          |   Project[bag][1] - scope-20
>          |
>          |---Package(Packager)[tuple]{chararray} - scope-19--------
> Global sort: false
> Secondary sort: true
> ----------------
>
> MapReduce node scope-29
> Map Plan
> b: Local Rearrange[tuple]{int}(false) - scope-30
> |   |
> |   Project[int][0] - scope-8
> |
> |---Load(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage)
> - scope-28--------
> Combine Plan
> Local Rearrange[tuple]{int}(false) - scope-35
> |   |
> |   Project[int][0] - scope-8
> |
> |---Limit - scope-34
>      |
>      |---New For Each(true)[tuple] - scope-33
>          |   |
>          |   Project[bag][1] - scope-32
>          |
>          |---Package(LitePackager)[tuple]{int} - scope-31--------
> Reduce Plan
> c: Store(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage)
> - scope-10
> |
> |---Limit - scope-39
>      |
>      |---New For Each(true)[tuple] - scope-38
>          |   |
>          |   Project[bag][1] - scope-37
>          |
>          |---Package(LitePackager)[tuple]{int} - scope-36--------
> Global sort: true
> Quantile file: hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425
> ----------------
>
> MapReduce node scope-40
> Map Plan
> b: Local Rearrange[tuple]{int}(false) - scope-42
> |   |
> |   Project[int][0] - scope-43
> |
> |---Load(hdfs://
> zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage)
> - scope-41--------
> Reduce Plan
> c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-49
> |
> |---Limit - scope-48
>      |
>      |---New For Each(true)[bag] - scope-47
>          |   |
>          |   Project[tuple][1] - scope-46
>          |
>          |---Package(LitePackager)[tuple]{int} - scope-45--------
> Global sort: false
> ----------------
>
>
>
>
> Kelly Zhang/Zhang,Liyun
> Best Regards
>
>

Reply via email to