Support new annotation @UDFType(stateful = true)
------------------------------------------------
Key: HIVE-1994
URL: https://issues.apache.org/jira/browse/HIVE-1994
Project: Hive
Issue Type: Improvement
Components: Query Processor, UDF
Reporter: John Sichi
Assignee: John Sichi
Because Hive does not yet support window functions from SQL/OLAP, people have
started hacking around it by writing stateful UDF's for things like cumulative
sum. An example is row_sequence in contrib.
To clearly mark these, I think we should add a new annotation (with separate
semantics from the existing deterministic annotation). I'm proposing the name
stateful for lack of a better idea, but I'm open to suggestions.
The semantics are as follows:
* A stateful UDF can only be used in the SELECT list, not in other clauses such
as WHERE/ON/ORDER/GROUP
* When a stateful UDF is present in a query, there's an implication that its
SELECT needs to be treated as similar to TRANSFORM, i.e. when there's
DISTRIBUTE/CLUSTER/SORT clause, then run inside the corresponding reducer to
make sure that the results are as expected.
For the first one, an example of why we need this is AND/OR short-circuiting;
we don't want these optimizations to cause the invocation to be skipped in a
confusing way, so we should just ban it outright (which is what SQL/OLAP does
for window functions).
For the second one, I'm not entirely certain about the details since some of it
is lost in the mists in Hive prehistory, but at least if we have the
annotation, we'll be able to preserve backwards compatibility as we start
adding new cost-based optimizations which might otherwise break it. A specific
example would be inserting a materialization step (e.g. for global query
optimization) in between the DISTRIBUTE/CLUSTER/SORT and the outer SELECT
containing the stateful UDF invocation; this could be a problem if the mappers
in the second job subdivides the buckets generated by the first job. So we
wouldn't do anything immediately, but the presence of the annotation will help
us going forward.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira