[
https://issues.apache.org/jira/browse/PIG-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603412#comment-15603412
]
Rohini Palaniswamy commented on PIG-5036:
-----------------------------------------
Did not understand the below change. We are totally removing generation of
value for the tail column and always making it null?
{code}
tail: long value in order to create multiple mappers
{code}
{code}
- printf HDFS "$tuple,";
- for my $j ( 0 .. 1000000) {
- printf HDFS "%d",$j;
- }
- printf HDFS "\n";
+ printf HDFS "$tuple\n";
{code}
> Remove biggish from e2e input dataset
> -------------------------------------
>
> Key: PIG-5036
> URL: https://issues.apache.org/jira/browse/PIG-5036
> Project: Pig
> Issue Type: Improvement
> Components: e2e harness
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5036-1.patch, PIG-5036-2.patch
>
>
> To reduce e2e runtime. It takes around 10 min to generate it and more time to
> run the tests involving the file (Rank_4, Rank_5). Actually it is not
> necessary, the purpose is to run multiple map and we can do that with
> "mapreduce.input.fileinputformat.split.maxsize" parameter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)