Benjamin Kim created HIVE-6230:
----------------------------------
Summary: Hive UDAF with subquery runs all logic on reducers
Key: HIVE-6230
URL: https://issues.apache.org/jira/browse/HIVE-6230
Project: Hive
Issue Type: Bug
Components: UDF
Affects Versions: 0.10.0
Reporter: Benjamin Kim
When I have a subquery in my custom built UDAF, all the iterate,
terminatePartial, merge, terminate runs on reducers only, where iterate and
terminatePartial should run on mappers.
Now I don't know if this is due to design purpose, but this behavior leads to
very long execution time on reducers and create large temporary files from them.
This happened to me with SimpleUDAF. I haven't tested it with GenericUDAF.
Here is an example
SELECT MyUDAF(col1) FROM(
SELECT * FROM test)
GROUP BY col2
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)