[
https://issues.apache.org/jira/browse/HIVE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Naveen Gangam updated HIVE-9755:
--------------------------------
Attachment: HIVE-9755.patch
The merge() method during the reduce phase of the ngram UDAF should be a NO-OP
when the mapper returns an empty set. The value of ZERO returned in the list
(one and only one item) is an indicator that the iterate() method was never
called in that map job. So returning from merge() with no action.
> Hive built-in "ngram" UDAF fails when a mapper has no matches.
> --------------------------------------------------------------
>
> Key: HIVE-9755
> URL: https://issues.apache.org/jira/browse/HIVE-9755
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Affects Versions: 0.14.0
> Reporter: Naveen Gangam
> Assignee: Naveen Gangam
> Priority: Critical
> Attachments: HIVE-9755.patch
>
>
> hive> describe ngramtest;
> OK
> col1 int
> col3 string
> Time taken: 0.192 seconds, Fetched: 2 row(s)
> SELECT explode(ngrams(sentences(lower(t.col3)), 3, 10)) as x FROM (SELECT
> col3 FROM ngramtest WHERE col1=0) t;
> when any result has value equal null, returned the error.
> 2015-01-08 09:15:00,262 FATAL ExecReducer:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row (tag=0)
> {"key":{},"value":{"_col0":["0","0","0","0"]},"alias":0}
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> GenericUDAFnGramEvaluator: mismatch in value for 'n', which usually is caused
> by a non-constant expression. Found '0' and '1'.
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams$GenericUDAFnGramEvaluator.merge(GenericUDAFnGrams.java:242)
>
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142)
>
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658)
>
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:911)
>
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:753)
>
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)
>
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)