Usama Abdulrehman created PIG-5402: -------------------------------------- Summary: mapreduce job running very slow Key: PIG-5402 URL: https://issues.apache.org/jira/browse/PIG-5402 Project: Pig Issue Type: Test Components: grunt Affects Versions: 0.17.0 Reporter: Usama Abdulrehman Fix For: 0.17.0
I'm running a mapreduce mode in Apache Pig version 0.17.0 to simply dump a few lines of text data from a file on HDFS Hadoop-2.7.2 When executing the {{dump}} command, the execution goes very slow, however it gets completed. I see some failures during execution shown below: {{org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1589604570386_0002] [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1589604570386_0002] [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Failed to get map task report java.io.IOException: java.net.ConnectException: Call From localhost/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:343) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:572) at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184) at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.getTaskReports(MRJobStats.java:528) at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:355) at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:232) at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:164) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:379) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1475) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460) at org.apache.pig.PigServer.storeEx(PigServer.java:1119) at org.apache.pig.PigServer.store(PigServer.java:1082) at org.apache.pig.PigServer.openIterator(PigServer.java:995) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:782) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:383) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:564) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)}} Is there away to speed up the mapreduce job? {{}} {{}} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)