Using newest hive release (0.5.0) - Problem with count(1)
In the past I have used hive 0.3.0 successfully and now with a new project coming up I decided to give hive 0.5.0 a run and everything is working as expected, except for when I try to get a simple count of the table. The simple table is defined as: create table log_table (col1 string, col2 string, col3 string, col4 string, col5 string, col6 string) row format delimited fields terminated by '\t' stored as textfile; And the query I'm running is: select count(1) from log_table; From the hive command line I get the following errors: ... In order to set c constant number of reducers: set mapred.reduce.tasks=number Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Starting Job = job_201004010912_0015, Tracking URL = . And when looking at the failed hadoop jobs I see the following exception: Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113) ... Is this a known issue? Am I missing something? Any guidance would be appreciated. Thanks! Aaron
Re: UDAF on AWS Hive
Thanks Zheng, and thanks for your great support to this list. I took your idea and wrote the following code that worked for me...I'm no Java whiz...so it's probably fairly inefficient. I do get to talk to the Amazon folks from time to time, so I'll definitely mention my interest in upgrading the Hive version. Thanks again. Matt package com.company.hadoop.hive.udaf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.IntWritable; import java.util.Arrays; public class UDAFGroupConcat extends UDAF{ public static class GroupConcatStringEvaluator implements UDAFEvaluator { private Text mOutput; private boolean mEmpty; public GroupConcatStringEvaluator() { super(); init(); } public void init() { mOutput = null; mEmpty = true; } public boolean iterate(Text o, IntWritable N) { if (o!=null) { if(mEmpty) { mOutput = new Text(N+ +o.toString()); mEmpty = false; } else { String temp = mOutput.toString() + \t + N + + o.toString(); String[] split = temp.split(\t); Arrays.sort(split); String sorted = split[0]; for (int i = 1; i split.length; i++) { sorted = sorted + \t + split[i]; } mOutput.set(sorted); } } return true; } public Text terminatePartial() {return mEmpty ? null : mOutput;} public boolean merge(Text o) { if (o!=null) { if(mEmpty) { mOutput = new Text(o.toString()); mEmpty = false; } else { String temp = mOutput.toString() + \t + o.toString(); String[] split = temp.split(\t); Arrays.sort(split); String sorted = split[0]; for (int i = 1; i split.length; i++) { sorted = sorted + \t + split[i]; } mOutput.set(sorted); } } return true; } public Text terminate() {return mEmpty ? null : mOutput;} } } On Fri, Apr 2, 2010 at 4:11 PM, Matthew Bryan gou...@gmail.com wrote: I'm writing a basic group_concat UDAF for the Amazon version of Hiveand it's working fine for unordered groupings. But I can't seem to get an ordered version working (filling an array based on an IntWritable passed alongside). When I move from using Text return type on terminatePartial() to either Text[] or a State class I start getting errors: FAILED: Error in semantic analysis: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize return type class [Lorg.apache.hadoop.io.Text; from public org.apache.hadoop.io.Text[] com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator.terminatePartial() or FAILED: Error in semantic analysis: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize return type class com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConc atNState from public com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConcatNState com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator.terminatePartial () What limits are there on the return type of terminatePartial()shouldn't it just have to match the argument of merge and nothing more? Keep in mind this is the Amazon version of Hive (0.4 I think) I put both versions of the UDAF below, ordered and unordered. Thanks for your time. Matt # Working Unordered /*QUERY: select user, event, group_concat(details) from datatable group by user,event;*/ package com.company.hadoop.hive.udaf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.io.Text; public class UDAFGroupConcat extends UDAF{ public static class GroupConcatStringEvaluator implements UDAFEvaluator { private Text mOutput; private boolean mEmpty; public GroupConcatStringEvaluator() { super(); init(); } public void init() { mOutput = null; mEmpty = true; }
Re: Using newest hive release (0.5.0) - Problem with count(1)
I am using 1.6, however it is the IBM jvm (not my choice). If the feature is known to work on the Sun JVM then I will deal with the problem another way. Thanks. Aaron On Tue, Apr 6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote: Are you using Java 1.5? Hive now requires Java 1.6 On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote: In the past I have used hive 0.3.0 successfully and now with a new project coming up I decided to give hive 0.5.0 a run and everything is working as expected, except for when I try to get a simple count of the table. The simple table is defined as: create table log_table (col1 string, col2 string, col3 string, col4 string, col5 string, col6 string) row format delimited fields terminated by '\t' stored as textfile; And the query I'm running is: select count(1) from log_table; From the hive command line I get the following errors: ... In order to set c constant number of reducers: set mapred.reduce.tasks=number Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Starting Job = job_201004010912_0015, Tracking URL = . And when looking at the failed hadoop jobs I see the following exception: Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113) ... Is this a known issue? Am I missing something? Any guidance would be appreciated. Thanks! Aaron -- Yours, Zheng http://www.linkedin.com/in/zshao
Issue in installing Hive
I'm trying to run Hive 0.5 release with Hadoop 0.20.2 on a standalone machine. HDFS + Hadoop is working, but I'm not able to get Hive running. When I do SHOW TABLES, I get the following error: http://pastebin.com/XvNR0U86 What am I doing wrong here? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Using newest hive release (0.5.0) - Problem with count(1)
Yes we use sun jdk 1.6 and it works. On Tue, Apr 6, 2010 at 12:32 PM, Aaron McCurry amccu...@gmail.com wrote: I am using 1.6, however it is the IBM jvm (not my choice). If the feature is known to work on the Sun JVM then I will deal with the problem another way. Thanks. Aaron On Tue, Apr 6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote: Are you using Java 1.5? Hive now requires Java 1.6 On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote: In the past I have used hive 0.3.0 successfully and now with a new project coming up I decided to give hive 0.5.0 a run and everything is working as expected, except for when I try to get a simple count of the table. The simple table is defined as: create table log_table (col1 string, col2 string, col3 string, col4 string, col5 string, col6 string) row format delimited fields terminated by '\t' stored as textfile; And the query I'm running is: select count(1) from log_table; From the hive command line I get the following errors: ... In order to set c constant number of reducers: set mapred.reduce.tasks=number Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Exception during encoding:java.lang.Exception: failed to write expression: GenericUDAFEvaluator$Mode=Class.new(); Continue... Starting Job = job_201004010912_0015, Tracking URL = . And when looking at the failed hadoop jobs I see the following exception: Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113) ... Is this a known issue? Am I missing something? Any guidance would be appreciated. Thanks! Aaron -- Yours, Zheng http://www.linkedin.com/in/zshao -- Yours, Zheng http://www.linkedin.com/in/zshao
Re: Issue in installing Hive
Hi Amandeep, This problem arises if you grab a copy of the Hadoop tar ball and attempt to build it. The tar ball comes packaged with a copy of core-3.1.1.jar in the lib/ subdirectory, and building the package results in another copy of core-3.1.1.jar located in build/ivy/lib/Hadoop/common/core. bin/hadoop adds both jars to the CLASSPATH, which causes the DataNucleus ORM to complain. The quick fix is delete the copy of core-3.1.1.jar located in build/ivy/lib/Hadoop/common/core. Thanks. Carl On Tue, Apr 6, 2010 at 12:49 PM, Amandeep Khurana ama...@gmail.com wrote: I'm trying to run Hive 0.5 release with Hadoop 0.20.2 on a standalone machine. HDFS + Hadoop is working, but I'm not able to get Hive running. When I do SHOW TABLES, I get the following error: http://pastebin.com/XvNR0U86 What am I doing wrong here? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Truncation error when creating table with column containing struct with many fields
Hello, I got the following error when creating a table with a column that has an ARRAY of STRUCTS with many fields. It appears that there is a 128 character limit on the column definition. FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add request failed : INSERT INTO COLUMNS (SD_ID,COMMENT,COLUMN_NAME,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) NestedThrowables: java.sql.BatchUpdateException: A truncation error was encountered trying to shrink VARCHAR 'arraystructid:int,fld1:bigint,fld2:int,fld3' to length 128. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I was able to get table create working after changing 128 to 256 in /metastore/src/model/package.jdo. Does anyone know if there are any adverse side-effects of doing so? Dilip
Re: Truncation error when creating table with column containing struct with many fields
That change should be fine. Zheng On Tue, Apr 6, 2010 at 5:16 PM, Dilip Joseph dilip.antony.jos...@gmail.com wrote: Hello, I got the following error when creating a table with a column that has an ARRAY of STRUCTS with many fields. It appears that there is a 128 character limit on the column definition. FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add request failed : INSERT INTO COLUMNS (SD_ID,COMMENT,COLUMN_NAME,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) NestedThrowables: java.sql.BatchUpdateException: A truncation error was encountered trying to shrink VARCHAR 'arraystructid:int,fld1:bigint,fld2:int,fld3' to length 128. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I was able to get table create working after changing 128 to 256 in /metastore/src/model/package.jdo. Does anyone know if there are any adverse side-effects of doing so? Dilip -- Yours, Zheng http://www.linkedin.com/in/zshao