Hi all:
Now I am implementing
PIG-4438<https://issues.apache.org/jira/browse/PIG-4438>(Can not work when in
"limit after sort" situation in spark mode).
testlimit.pig
a = load './testlimit.txt' as (x:int, y:chararray);
b = order a by x;
c = limit b 1;
store c into './testlimit.out';
explain c;
I read the code of MRCompiler#visitSort, can anyone tell me the function of
org.apache.pig.impl.builtin.RandomSampleLoader,
org.apache.pig.impl.builtin.FindQuantiles, why need get a sampling job when
using POSort?
I appreciate If someone can provide the design document of
MRCompiler#visitSort implemention.
following is mapreduce plan:
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-11
Map Plan
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage)
- scope-12
|
|---a: New For Each(false,false)[bag] - scope-7
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[chararray] - scope-5
| |
| |---Project[bytearray][1] - scope-4
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/testlimit.txt:org.apache.pig.builtin.PigStorage)
- scope-0--------
Global sort: false
----------------
MapReduce node scope-14
Map Plan
b: Local Rearrange[tuple]{tuple}(false) - scope-18
| |
| Constant(all) - scope-17
|
|---New For Each(false)[tuple] - scope-16
| |
| Project[int][0] - scope-15
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100'))
- scope-13--------
Reduce Plan
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425:org.apache.pig.impl.io.InterStorage)
- scope-27
|
|---New For Each(false)[tuple] - scope-26
| |
|
POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] - scope-25
| |
| |---Project[tuple][*] - scope-24
|
|---New For Each(false,false)[tuple] - scope-23
| |
| Constant(2) - scope-22
| |
| Project[bag][1] - scope-20
|
|---Package(Packager)[tuple]{chararray} -
scope-19--------
Global sort: false
Secondary sort: true
----------------
MapReduce node scope-29
Map Plan
b: Local Rearrange[tuple]{int}(false) - scope-30
| |
| Project[int][0] - scope-8
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage)
- scope-28--------
Combine Plan
Local Rearrange[tuple]{int}(false) - scope-35
| |
| Project[int][0] - scope-8
|
|---Limit - scope-34
|
|---New For Each(true)[tuple] - scope-33
| |
| Project[bag][1] - scope-32
|
|---Package(LitePackager)[tuple]{int} -
scope-31--------
Reduce Plan
c:
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage)
- scope-10
|
|---Limit - scope-39
|
|---New For Each(true)[tuple] - scope-38
| |
| Project[bag][1] - scope-37
|
|---Package(LitePackager)[tuple]{int} -
scope-36--------
Global sort: true
Quantile file:
hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425
----------------
MapReduce node scope-40
Map Plan
b: Local Rearrange[tuple]{int}(false) - scope-42
| |
| Project[int][0] - scope-43
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage)
- scope-41--------
Reduce Plan
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-49
|
|---Limit - scope-48
|
|---New For Each(true)[bag] - scope-47
| |
| Project[tuple][1] - scope-46
|
|---Package(LitePackager)[tuple]{int} -
scope-45--------
Global sort: false
----------------
Best regards
Zhang,Liyun