[ https://issues.apache.org/jira/browse/PIG-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603412#comment-15603412 ]
Rohini Palaniswamy commented on PIG-5036: ----------------------------------------- Did not understand the below change. We are totally removing generation of value for the tail column and always making it null? {code} tail: long value in order to create multiple mappers {code} {code} - printf HDFS "$tuple,"; - for my $j ( 0 .. 1000000) { - printf HDFS "%d",$j; - } - printf HDFS "\n"; + printf HDFS "$tuple\n"; {code} > Remove biggish from e2e input dataset > ------------------------------------- > > Key: PIG-5036 > URL: https://issues.apache.org/jira/browse/PIG-5036 > Project: Pig > Issue Type: Improvement > Components: e2e harness > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.17.0 > > Attachments: PIG-5036-1.patch, PIG-5036-2.patch > > > To reduce e2e runtime. It takes around 10 min to generate it and more time to > run the tests involving the file (Rank_4, Rank_5). Actually it is not > necessary, the purpose is to run multiple map and we can do that with > "mapreduce.input.fileinputformat.split.maxsize" parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)