[
https://issues.apache.org/jira/browse/PIG-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4438:
----------------------------------
Attachment: PIG-4438_1.patch
PIG-4438_1.patch is the initial patch. Meet some problems when running the
script in the bug description. Need more time to figure out. Error info is:
{code}
269 Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0
in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: java.lang.Byte
cannot be cast to java.util.Iterator
270 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:85)
271 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48)
272 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
273 at
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30)
274 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35)
275 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
276 at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
277 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
278 at
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
279 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30)
280 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
281 at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
282 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
283 at
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
284 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30)
285 at
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
286 at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
287 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
288 at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987)
289 at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
290 at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
291 at org.apache.spark.scheduler.Task.run(Task.scala:56)
292 at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
293 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
294 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
295 at java.lang.Thread.run(Thread.java:744)
{code}
> Can not work when in "limit after sort" situation in spark mode
> ---------------------------------------------------------------
>
> Key: PIG-4438
> URL: https://issues.apache.org/jira/browse/PIG-4438
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4438_1.patch
>
>
> when pig script executes "order" before "limit" in spark mode, the results
> will be wrong.
> cat testlimit.txt
> 1 orange
> 3 coconut
> 5 grape
> 6 pear
> 2 apple
> 4 mango
> testlimit.pig:
> a = load './testlimit.txt' as (x:int, y:chararray);
> b = order a by x;
> c = limit b 1;
> store c into './testlimit.out';
> the result:
> 1 orange
> 2 apple
> 3 coconut
> 4 mango
> 5 grape
> 6 pear
> the correct result should be:
> 1 orange
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)