[ 
https://issues.apache.org/jira/browse/PIG-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964045#comment-15964045
 ] 

Adam Szita commented on PIG-5135:
---------------------------------

The reason these tests failed is that they depend on *hdfsBytesRead stat which 
is always 0* when using Spark as execution engine. (TestOrcStoragePushdown 
compares bytes read with and without the optimization and expect a certain 
difference..)

Spark only counts the bytes read if the type of the split given is of 
FileSplit. 
[Here|https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L142]
 you can see that otherwise the {{bytesReadCallback}} is None, and because of 
that the counter is never incremented. This happens in our case because 
PigSplit is not a FileSplit.
In my patch [^PIG-5135.0.patch] I have created a wrapper {{SparkPigSplit}} that 
wraps a PigSplit instance and delegates every method to that. If the original 
PigSplit contained FileSplits then I create FileSparkPigSplit, otherwise 
GenericSparkPigSplit. (The former extends FileSplit so Spark will be able to 
count the bytes being read.)

[~kellyzly] please take a look.

> Fix TestOrcStoragePushdown unit test in Spark mode
> --------------------------------------------------
>
>                 Key: PIG-5135
>                 URL: https://issues.apache.org/jira/browse/PIG-5135
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5135.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to