Make MAP and REDUCE work as expected or add warnings
----------------------------------------------------
Key: HIVE-835
URL: https://issues.apache.org/jira/browse/HIVE-835
Project: Hadoop Hive
Issue Type: Improvement
Reporter: Adam Kramer
There are syntactic elements MAP and REDUCE which function as syntactic sugar
for SELECT TRANSFORM. This behavior is not at all intuitive, because no
checking or verification is done to ensure that the user's intention is met.
Specifically, Hive may see a MAP query and simply tack the transform script on
to the end of a reduce job (so, the user says MAP but hive does a REDUCE), or
(more dangerously) vice-versa. Given that Hive's whole point is to sit on top
of a mapreduce framework and allow transformations in the mapper or reducer, it
seems very inappropriate for Hive to ignore a clear command from the user to
MAP or to REDUCE the data using a script, and then simply ignore it.
Better behavior would be for hive to see a MAP command and to start a new
mapreduce step and run the command in the mapper (even if it otherwise would be
run in the reducer), and for REDUCE to begin a reduce step if necessary (so,
tack the REDUCE script on to the end of a REDUCE job if the current system
would do so, or if not, treat the 0th column as the reduce key, throw a warning
saying this has been done, and force a reduce job).
Acceptable behavior would be to throw an error or warning when the user's
clearly-stated desire is going to be ignored. "Warning: User used MAP keyword,
but transformation will occur in the reduce phase" / "Warning: User used REDUCE
keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. Transformation
will occur in the map phase."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.