[ 
https://issues.apache.org/jira/browse/PIG-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014254#comment-14014254
 ] 

Koji Noguchi commented on PIG-3975:
-----------------------------------

With the script from the description, job DAG looked like below.
{noformat}
Job DAG:
job_1399356417814_189120        ->      
job_1399356417814_189121,job_1399356417814_189122,
job_1399356417814_189121        ->      job_1399356417814_189123,
job_1399356417814_189123
job_1399356417814_189122
{noformat}

Looking at the plan, I see that even though job_1399356417814_189122 and 
job_1399356417814_189123 read from output of job_1399356417814_189121, somehow 
job_1399356417814_189122 is missing that dependency.

{noformat}
==============================================================
job_1399356417814_189120 
pig.inputs ================
hdfs:/aaa.bbb.ccc:8020/user/knoguchi/input1:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
B: Local Rearrange[tuple]{int}(false) - scope-8
|   |
|   Project[int][0] - scope-9
|
|---A: New For Each(false)[bag] - scope-5
    |   |
    |   Cast[int] - scope-3
    |   |
    |   |---Project[bytearray][0] - scope-2
pig.reducePlan=================
Empty Plan!
pig.reduce.stores=================
[(Name: B: 
Store(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-113052755:org.apache.pig.impl.io.TFileStorage)
 - scope-10

==============================================================
job_1399356417814_189121
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-113052755:org.apache.pig.impl.io.TFileStorage
pig.mapPlan=================
Empty Plan!
pig.map.stores=================
[(Name: 
Store(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368:org.apache.pig.impl.io.InterStorage)
 - scope-25 Operator Key: scope-25)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
==============================================================
job_1399356417814_189122
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/user/knoguchi/input3:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
F: New For Each(false)[bag] - scope-20
|   |
|   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-19
|   |
|   |---Constant(0) - scope-17
|   |
|   |---Constant(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368) - 
scope-18
pig.map.stores=================
[(Name: F: Store(/tmp/deletemeF:org.apache.pig.builtin.PigStorage) - scope-21 
Operator Key: scope-21)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
==============================================================
job_1399356417814_189123
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/user/knoguchi/input2.txt:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
D: New For Each(false)[bag] - scope-14
|   |
|   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
|   |
|   |---Constant(0) - scope-11
|   |
|   |---Constant(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368) - 
scope-12
pig.map.stores=================
[(Name: D: Store(/tmp/deletemeD:org.apache.pig.builtin.PigStorage) - scope-15 
Operator Key: scope-15)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
{noformat}




> Multiple Scalar reference calls leading to missing records
> ----------------------------------------------------------
>
>                 Key: PIG-3975
>                 URL: https://issues.apache.org/jira/browse/PIG-3975
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.2, 0.10.1, 0.11.1, 0.12.2
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Critical
>
> We noticed that multiple pig runs with same input were producing different 
> outputs.
> Simplified script looked like this.
> {noformat}
> A = load 'input1' as (a1:int);
> B = group A by a1 parallel 200;
> C = load 'input2' as (c1:int);
> D = foreach C generate B.$0;
> store D into '/tmp/deletemeD';
> E = load 'input3' as (c1:int);
> F = foreach E generate B.$0;
> store F into '/tmp/deletemeF';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to