[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

Jie Li (JIRA) Mon, 21 May 2012 19:03:44 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280683#comment-13280683
 ]


Jie Li commented on PIG-2661:
-----------------------------

Here is the comparison between 0.8.1 and 0.9.0:
|| types || pig 0.8.1 || pig 0.9.0 ||
|No schema:
A = load 'input';
B = order A by $0;
store B into 'output';| 2 jobs | 2 jobs |
|Schema without types:
A = load 'input' as (a,b,c);
B = order A by a;
store B into 'output'; | 2 jobs | 3 jobs |
|Schema with types:
A = load 'input' as (a:chararray,b,c);
B = order A by a;
store B into 'output'; | 3 jobs | 3 jobs |

The difference between 0.8.1 and 0.9.0 is when a schema without types is 
provided (as in Pigmix L9), Pig 0.9.0 will use an extra job. This difference 
was introduced in [PIG-1188 Padding nulls to the input tuple according to input 
schema|https://issues.apache.org/jira/browse/PIG-1188], where a Foreach is 
inserted for untyped data in order to get the same behaviour of padding nulls 
as for typed data. Linked to PIG-1188.

Daniel: As you said, we may merge 1st job pipleline into 2nd/3th job, which 
will make all the three cases have only 2 jobs. Can we implement it in 
SampleOptimizer by pushing the 1st job's foreach to the RandomSampleLoader?
                
> Pig uses an extra job for loading data in Pigmix L9
> ---------------------------------------------------
>
>                 Key: PIG-2661
>                 URL: https://issues.apache.org/jira/browse/PIG-2661
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Jie Li
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

Reply via email to