Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

Na Yang Thu, 06 Nov 2014 18:35:47 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
-----------------------------------------------------------


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: Hive-8756
    https://issues.apache.org/jira/browse/Hive-8756


Repository: hive-git


Description
-------

numRows and rawDataSize are not collected by the Spark stats. That is caused by 
the FileSinkOperator in the ReduceWork is not set the stats config. In the 
GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
FileSinkOperator is generated and set to the reduce work. However, during 
processFileSink, the original FileSinkOperator is set the collectStats tag in 
GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the 
ReduceWork.  


Diffs
-----

  itests/src/test/resources/testconfiguration.properties 79a0132 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
8290568 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 
  ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27719/diff/


Testing
-------


Thanks,

Na Yang

Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

Reply via email to