[ https://issues.apache.org/jira/browse/PIG-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964045#comment-15964045 ]
Adam Szita commented on PIG-5135: --------------------------------- The reason these tests failed is that they depend on *hdfsBytesRead stat which is always 0* when using Spark as execution engine. (TestOrcStoragePushdown compares bytes read with and without the optimization and expect a certain difference..) Spark only counts the bytes read if the type of the split given is of FileSplit. [Here|https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L142] you can see that otherwise the {{bytesReadCallback}} is None, and because of that the counter is never incremented. This happens in our case because PigSplit is not a FileSplit. In my patch [^PIG-5135.0.patch] I have created a wrapper {{SparkPigSplit}} that wraps a PigSplit instance and delegates every method to that. If the original PigSplit contained FileSplits then I create FileSparkPigSplit, otherwise GenericSparkPigSplit. (The former extends FileSplit so Spark will be able to count the bytes being read.) [~kellyzly] please take a look. > Fix TestOrcStoragePushdown unit test in Spark mode > -------------------------------------------------- > > Key: PIG-5135 > URL: https://issues.apache.org/jira/browse/PIG-5135 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: liyunzhang_intel > Assignee: Adam Szita > Fix For: spark-branch > > Attachments: PIG-5135.0.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)