[ 
https://issues.apache.org/jira/browse/PIG-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264164#comment-14264164
 ] 

liyunzhang_intel commented on PIG-4364:
---------------------------------------

currently we still need following code in SparkLauncher.java Line 112~116 
{code}
        // Code pulled from MapReduceLauncher
        MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext);
        mrCompiler.compile();
        MROperPlan plan = mrCompiler.getMRPlan();
        POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan);
        pkgAnnotator.visit();
{code}

If without these code, following script will throw exception:
{quote}
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
1, localhost): java.lang.NullPointerException: 
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.getValueTuple(Packager.java:215)
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:424)
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:408)
        
org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:151)
        
org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:137)
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.attachInput(Packager.java:125)
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNextTuple(POPackage.java:283)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:111)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48)
        scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
        
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:920)
        
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:903)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:744)
{quote}
*why throw this exception?*
POPackageAnnotator#visit will call following function stack to avoid the above 
NPE :
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.LoRearrangeDiscoverer#visitLocalRearrange
{code}
  keyInfo.put(Integer.valueOf(lrearrange.getIndex()),
                new Pair<Boolean, Map<Integer, Integer>>(
                        lrearrange.isProjectStar(), 
lrearrange.getProjectedColsMap()));
{code}
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange#visit
 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#handlePackage
    
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#visitMROp

After deleting above code, unit test add 89 failures. So close this bug because 
this is not a bug.

> remove unnessary MR plan code generated in SparkLauncher.java
> -------------------------------------------------------------
>
>                 Key: PIG-4364
>                 URL: https://issues.apache.org/jira/browse/PIG-4364
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4364.patch
>
>
> following code in SparkLauncher.java Line 112~116 is about MR plan is 
> generated in Spark mode which is unnecessary.
> {code}
>         // Code pulled from MapReduceLauncher
>         MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext);
>         mrCompiler.compile();
>         MROperPlan plan = mrCompiler.getMRPlan();
>         POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan);
>         pkgAnnotator.visit();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to