-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
-----------------------------------------------------------
Review request for hive, Brock Noland and Xuefu Zhang.
Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776
Repository: hive-git
Description
-------
Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
Currently, Hive on Spark can't get parameter value of mapred.task.id from
configuration.
FileSinkOperator use taskid to distinct different bucket file name,
FileSinkOperator should take taskid as field variable and initiate it only once
since one FileSinkOperator instance only refered in one task. but
FileSinkOperator call Utilities::getTaskId to get new taskId each time, for
this issue, it would cause more bucket files than bucket number, which lead to
unexpected result of tablesample queries.
Diffs
-----
itests/src/test/resources/testconfiguration.properties 155abad
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3ff0782
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 02f9d99
ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/25495/diff/
Testing
-------
Thanks,
chengxiang li