-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
-----------------------------------------------------------
Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
Bugs: Hive-8756
https://issues.apache.org/jira/browse/Hive-8756
Repository: hive-git
Description
-------
numRows and rawDataSize are not collected by the Spark stats. That is caused by
the FileSinkOperator in the ReduceWork is not set the stats config. In the
GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new
FileSinkOperator is generated and set to the reduce work. However, during
processFileSink, the original FileSinkOperator is set the collectStats tag in
GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the
ReduceWork.
Diffs
-----
itests/src/test/resources/testconfiguration.properties 79a0132
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java
8290568
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7
ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27719/diff/
Testing
-------
Thanks,
Na Yang