[jira] [Commented] (PIG-5036) Remove biggish from e2e input dataset

Rohini Palaniswamy (JIRA) Mon, 24 Oct 2016 15:25:36 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603412#comment-15603412
 ]


Rohini Palaniswamy commented on PIG-5036:
-----------------------------------------

Did not understand the below change. We are totally removing generation of 
value for the tail column and always making it null?
{code}
tail:     long value in order to create multiple mappers
{code}
{code}
-            printf HDFS "$tuple,";
-            for my $j ( 0 .. 1000000) {
-                               printf HDFS "%d",$j;
-                       }
-                       printf HDFS "\n";
+            printf HDFS "$tuple\n";
{code}






> Remove biggish from e2e input dataset
> -------------------------------------
>
>                 Key: PIG-5036
>                 URL: https://issues.apache.org/jira/browse/PIG-5036
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.17.0
>
>         Attachments: PIG-5036-1.patch, PIG-5036-2.patch
>
>
> To reduce e2e runtime. It takes around 10 min to generate it and more time to 
> run the tests involving the file (Rank_4, Rank_5). Actually it is not 
> necessary, the purpose is to run multiple map and we can do that with 
> "mapreduce.input.fileinputformat.split.maxsize" parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-5036) Remove biggish from e2e input dataset

Reply via email to