[
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280683#comment-13280683
]
Jie Li commented on PIG-2661:
-----------------------------
Here is the comparison between 0.8.1 and 0.9.0:
|| types || pig 0.8.1 || pig 0.9.0 ||
|No schema:
A = load 'input';
B = order A by $0;
store B into 'output';| 2 jobs | 2 jobs |
|Schema without types:
A = load 'input' as (a,b,c);
B = order A by a;
store B into 'output'; | 2 jobs | 3 jobs |
|Schema with types:
A = load 'input' as (a:chararray,b,c);
B = order A by a;
store B into 'output'; | 3 jobs | 3 jobs |
The difference between 0.8.1 and 0.9.0 is when a schema without types is
provided (as in Pigmix L9), Pig 0.9.0 will use an extra job. This difference
was introduced in [PIG-1188 Padding nulls to the input tuple according to input
schema|https://issues.apache.org/jira/browse/PIG-1188], where a Foreach is
inserted for untyped data in order to get the same behaviour of padding nulls
as for typed data. Linked to PIG-1188.
Daniel: As you said, we may merge 1st job pipleline into 2nd/3th job, which
will make all the three cases have only 2 jobs. Can we implement it in
SampleOptimizer by pushing the 1st job's foreach to the RandomSampleLoader?
> Pig uses an extra job for loading data in Pigmix L9
> ---------------------------------------------------
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.9.0
> Reporter: Jie Li
>
> See
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira