Ken Ellinwood created SPARK-1591:
------------------------------------
Summary: scala.MatchError executing custom UDTF
Key: SPARK-1591
URL: https://issues.apache.org/jira/browse/SPARK-1591
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 0.9.1
Environment: CentOS 5, Hortonworks 1.3.2, Hadoop 1.2.0, Hive 0.11.0,
Spark 0.9.1, Shark 0.9.1, sharkserver2, beeline
Reporter: Ken Ellinwood
Priority: Minor
My custom UDTF fails to execute in Shark even though it runs fine in Hive.
scala.MatchError: [orange, 1, Black, 419] (of class java.util.ArrayList)
at scala.runtime.ScalaRunTime$.array_clone(ScalaRunTime.scala:118)
at shark.execution.UDTFCollector.collect(UDTFOperator.scala:92)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:91)
at
com.mycompany.warehouse.hive.HiveUdtfColorTreeTable.process(HiveUdtfColorTreeTable.java:98)
at shark.execution.UDTFOperator.explode(UDTFOperator.scala:79)
at
shark.execution.LateralViewJoinOperator$$anonfun$processPartition$1.apply(LateralViewJoinOperator.scala:141)
The code at UDTFOperator.scala, line 92 is making two assumptions which are not
true in my case. First, it claims to need to clone the row object. Second, it
assumes all rows objects are arrays. In my case the row is represented by
ArrayList and does not need to be cloned because my UDTF creates a new one for
each row already. The clone operation fails because my row is not an array.
I changed my implementation to use an array, but we have a non-trivial number
of custom UDFs that all work with Hive and I think they should work in Shark
without modification.
--
This message was sent by Atlassian JIRA
(v6.2#6252)