[ https://issues.apache.org/jira/browse/PIG-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264164#comment-14264164 ]
liyunzhang_intel commented on PIG-4364: --------------------------------------- currently we still need following code in SparkLauncher.java Line 112~116 {code} // Code pulled from MapReduceLauncher MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext); mrCompiler.compile(); MROperPlan plan = mrCompiler.getMRPlan(); POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan); pkgAnnotator.visit(); {code} If without these code, following script will throw exception: {quote} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 1, localhost): java.lang.NullPointerException: org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.getValueTuple(Packager.java:215) org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:424) org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:408) org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:151) org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:137) org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.attachInput(Packager.java:125) org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNextTuple(POPackage.java:283) org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:111) org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30) org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35) org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:920) org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:903) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {quote} *why throw this exception?* POPackageAnnotator#visit will call following function stack to avoid the above NPE : org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.LoRearrangeDiscoverer#visitLocalRearrange {code} keyInfo.put(Integer.valueOf(lrearrange.getIndex()), new Pair<Boolean, Map<Integer, Integer>>( lrearrange.isProjectStar(), lrearrange.getProjectedColsMap())); {code} org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange#visit org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#handlePackage org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#visitMROp After deleting above code, unit test add 89 failures. So close this bug because this is not a bug. > remove unnessary MR plan code generated in SparkLauncher.java > ------------------------------------------------------------- > > Key: PIG-4364 > URL: https://issues.apache.org/jira/browse/PIG-4364 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Attachments: PIG-4364.patch > > > following code in SparkLauncher.java Line 112~116 is about MR plan is > generated in Spark mode which is unnecessary. > {code} > // Code pulled from MapReduceLauncher > MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext); > mrCompiler.compile(); > MROperPlan plan = mrCompiler.getMRPlan(); > POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan); > pkgAnnotator.visit(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)