Using newest hive release (0.5.0) - Problem with count(1)

2010-04-06 Thread Aaron McCurry
In the past I have used hive 0.3.0 successfully and now with a new project
coming up I decided to give hive 0.5.0 a run and everything is working as
expected, except for when I try to get a simple count of the table.

The simple table is defined as:

create table log_table (col1 string, col2 string, col3 string, col4 string,
col5 string, col6 string)
row format delimited
fields terminated by '\t'
stored as textfile;

And the query I'm running is:

select count(1) from log_table;

From the hive command line I get the following errors:

...
In order to set c constant number of reducers:
   set mapred.reduce.tasks=number
Exception during encoding:java.lang.Exception: failed to write expression:
GenericUDAFEvaluator$Mode=Class.new();
Continue...
Exception during encoding:java.lang.Exception: failed to write expression:
GenericUDAFEvaluator$Mode=Class.new();
Continue...
Exception during encoding:java.lang.Exception: failed to write expression:
GenericUDAFEvaluator$Mode=Class.new();
Continue...
Exception during encoding:java.lang.Exception: failed to write expression:
GenericUDAFEvaluator$Mode=Class.new();
Continue...
Starting Job = job_201004010912_0015, Tracking URL = .



And when looking at the failed hadoop jobs I see the following exception:

Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector
incompatible with
org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)
...


Is this a known issue?  Am I missing something?  Any guidance would be
appreciated.  Thanks!

Aaron


Re: UDAF on AWS Hive

2010-04-06 Thread Matthew Bryan
Thanks Zheng, and thanks for your great support to this list. I took
your idea and wrote the following code that worked for me...I'm no
Java whiz...so it's probably fairly inefficient. I do get to talk to
the Amazon folks from time to time, so I'll definitely mention my
interest in upgrading the Hive version. Thanks again.

Matt

package com.company.hadoop.hive.udaf;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import java.util.Arrays;

public class UDAFGroupConcat extends UDAF{

public static class GroupConcatStringEvaluator implements
UDAFEvaluator {
private Text mOutput;
private boolean mEmpty;

public GroupConcatStringEvaluator() {
super();
init();
}

public void init() {
mOutput = null;
mEmpty = true;
}
public boolean iterate(Text o,  IntWritable N) {
if (o!=null) {
if(mEmpty) {
mOutput = new Text(N+ +o.toString());
mEmpty = false;
} else {
String temp = mOutput.toString() +
\t + N +   + o.toString();
String[] split = temp.split(\t);
Arrays.sort(split);
String sorted = split[0];
for (int i = 1; i  split.length; i++)
{
sorted = sorted + \t + split[i];
}
mOutput.set(sorted);
}
}
return true;
}
public Text terminatePartial() {return mEmpty ? null : mOutput;}
public boolean merge(Text o) {
if (o!=null) {
if(mEmpty) {
mOutput = new Text(o.toString());
mEmpty = false;
} else {
String temp = mOutput.toString() +
\t + o.toString();
String[] split = temp.split(\t);
Arrays.sort(split);
String sorted = split[0];
for (int i = 1; i  split.length; i++)
{
sorted = sorted + \t + split[i];
}
mOutput.set(sorted);
}
}
return true;
}
public Text terminate() {return mEmpty ? null : mOutput;}
}
}


On Fri, Apr 2, 2010 at 4:11 PM, Matthew Bryan gou...@gmail.com wrote:
 I'm writing a basic group_concat UDAF for the Amazon version of
 Hiveand it's working fine for unordered groupings. But I can't
 seem to get an ordered version working (filling an array based on an
 IntWritable passed alongside). When I move from using Text return type
 on terminatePartial() to either Text[] or a State class I start
 getting errors:

 FAILED: Error in semantic analysis:
 org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize
 return type class [Lorg.apache.hadoop.io.Text; from public
 org.apache.hadoop.io.Text[]
 com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator.terminatePartial()

 or

 FAILED: Error in semantic analysis:
 org.apache.hadoop.hive.ql.metadata.HiveException: Cannot recognize
 return type class
 com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConc
 atNState from public
 com.company.hadoop.hive.udaf.UDAFGroupConcatN$UDAFGroupConcatNState
 com.company.hadoop.hive.udaf.UDAFGroupConcatN$GroupConcatNStringEvaluator.terminatePartial
 ()

 What limits are there on the return type of
 terminatePartial()shouldn't it just have to match the argument of
 merge and nothing more? Keep in mind this is the Amazon version of
 Hive (0.4 I think)

 I put both versions of the UDAF below, ordered and unordered.

 Thanks for your time.

 Matt


 # Working Unordered 
 /*QUERY: select user, event, group_concat(details) from datatable
 group by user,event;*/

 package com.company.hadoop.hive.udaf;

 import org.apache.hadoop.hive.ql.exec.UDAF;
 import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
 import org.apache.hadoop.io.Text;

 public class UDAFGroupConcat extends UDAF{

        public static class GroupConcatStringEvaluator implements
 UDAFEvaluator {
                private Text mOutput;
                private boolean mEmpty;

        public GroupConcatStringEvaluator() {
                super();
                init();
        }

        public void init() {
                mOutput = null;
                mEmpty = true;
        }

 

Re: Using newest hive release (0.5.0) - Problem with count(1)

2010-04-06 Thread Aaron McCurry
I am using 1.6, however it is the IBM jvm (not my choice).  If the feature
is known to work on the Sun JVM then I will deal with the problem another
way.  Thanks.

Aaron

On Tue, Apr 6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote:

 Are you using Java 1.5? Hive now requires Java 1.6


 On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote:
  In the past I have used hive 0.3.0 successfully and now with a new
 project
  coming up I decided to give hive 0.5.0 a run and everything is working as
  expected, except for when I try to get a simple count of the table.
 
  The simple table is defined as:
 
  create table log_table (col1 string, col2 string, col3 string, col4
 string,
  col5 string, col6 string)
  row format delimited
  fields terminated by '\t'
  stored as textfile;
 
  And the query I'm running is:
 
  select count(1) from log_table;
 
  From the hive command line I get the following errors:
 
  ...
  In order to set c constant number of reducers:
 set mapred.reduce.tasks=number
  Exception during encoding:java.lang.Exception: failed to write
 expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
 expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
 expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
 expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Starting Job = job_201004010912_0015, Tracking URL = .
 
 
 
  And when looking at the failed hadoop jobs I see the following exception:
 
  Caused by: java.lang.ClassCastException:
 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector
  incompatible with
 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector
  at
 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93)
  at
 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)
  ...
 
 
  Is this a known issue?  Am I missing something?  Any guidance would be
  appreciated.  Thanks!
 
  Aaron
 



 --
 Yours,
 Zheng
 http://www.linkedin.com/in/zshao



Issue in installing Hive

2010-04-06 Thread Amandeep Khurana
I'm trying to run Hive 0.5 release with Hadoop 0.20.2 on a standalone
machine. HDFS + Hadoop is working, but I'm not able to get Hive running.
When I do SHOW TABLES, I get the following error:
http://pastebin.com/XvNR0U86

What am I doing wrong here?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


Re: Using newest hive release (0.5.0) - Problem with count(1)

2010-04-06 Thread Zheng Shao
Yes we use sun jdk 1.6 and it works.

On Tue, Apr 6, 2010 at 12:32 PM, Aaron McCurry amccu...@gmail.com wrote:
 I am using 1.6, however it is the IBM jvm (not my choice).  If the feature
 is known to work on the Sun JVM then I will deal with the problem another
 way.  Thanks.

 Aaron

 On Tue, Apr 6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote:

 Are you using Java 1.5? Hive now requires Java 1.6


 On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote:
  In the past I have used hive 0.3.0 successfully and now with a new
  project
  coming up I decided to give hive 0.5.0 a run and everything is working
  as
  expected, except for when I try to get a simple count of the table.
 
  The simple table is defined as:
 
  create table log_table (col1 string, col2 string, col3 string, col4
  string,
  col5 string, col6 string)
  row format delimited
  fields terminated by '\t'
  stored as textfile;
 
  And the query I'm running is:
 
  select count(1) from log_table;
 
  From the hive command line I get the following errors:
 
  ...
  In order to set c constant number of reducers:
     set mapred.reduce.tasks=number
  Exception during encoding:java.lang.Exception: failed to write
  expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
  expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
  expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Exception during encoding:java.lang.Exception: failed to write
  expression:
  GenericUDAFEvaluator$Mode=Class.new();
  Continue...
  Starting Job = job_201004010912_0015, Tracking URL = .
 
 
 
  And when looking at the failed hadoop jobs I see the following
  exception:
 
  Caused by: java.lang.ClassCastException:
 
  org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector
  incompatible with
 
  org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector
      at
 
  org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:93)
      at
 
  org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)
  ...
 
 
  Is this a known issue?  Am I missing something?  Any guidance would be
  appreciated.  Thanks!
 
  Aaron
 



 --
 Yours,
 Zheng
 http://www.linkedin.com/in/zshao





-- 
Yours,
Zheng
http://www.linkedin.com/in/zshao


Re: Issue in installing Hive

2010-04-06 Thread Carl Steinbach
Hi Amandeep,

This problem arises if you grab a copy of the Hadoop tar ball and attempt
to build it. The tar ball comes packaged with a copy of core-3.1.1.jar in
the lib/
subdirectory, and building the package results in another copy of
core-3.1.1.jar
located in build/ivy/lib/Hadoop/common/core. bin/hadoop adds both jars to
the
CLASSPATH, which causes the DataNucleus ORM to complain. The quick fix
is delete the copy of core-3.1.1.jar located
in build/ivy/lib/Hadoop/common/core.

Thanks.

Carl


On Tue, Apr 6, 2010 at 12:49 PM, Amandeep Khurana ama...@gmail.com wrote:

 I'm trying to run Hive 0.5 release with Hadoop 0.20.2 on a standalone
 machine. HDFS + Hadoop is working, but I'm not able to get Hive running.
 When I do SHOW TABLES, I get the following error:
 http://pastebin.com/XvNR0U86

 What am I doing wrong here?

 Amandeep


 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz



Truncation error when creating table with column containing struct with many fields

2010-04-06 Thread Dilip Joseph
Hello,

I got the following error when creating a table with a column that has
an ARRAY of STRUCTS with many fields.  It appears that there is a 128
character limit on the column definition.

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add
request failed : INSERT INTO COLUMNS
(SD_ID,COMMENT,COLUMN_NAME,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?)
NestedThrowables:
java.sql.BatchUpdateException: A truncation error was encountered
trying to shrink VARCHAR
'arraystructid:int,fld1:bigint,fld2:int,fld3' to length 128.
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask

I was able to get table create working after changing 128 to 256 in
/metastore/src/model/package.jdo.   Does anyone know if there are any
adverse side-effects of doing so?

Dilip


Re: Truncation error when creating table with column containing struct with many fields

2010-04-06 Thread Zheng Shao
That change should be fine.

Zheng

On Tue, Apr 6, 2010 at 5:16 PM, Dilip Joseph
dilip.antony.jos...@gmail.com wrote:
 Hello,

 I got the following error when creating a table with a column that has
 an ARRAY of STRUCTS with many fields.  It appears that there is a 128
 character limit on the column definition.

 FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add
 request failed : INSERT INTO COLUMNS
 (SD_ID,COMMENT,COLUMN_NAME,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?)
 NestedThrowables:
 java.sql.BatchUpdateException: A truncation error was encountered
 trying to shrink VARCHAR
 'arraystructid:int,fld1:bigint,fld2:int,fld3' to length 128.
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask

 I was able to get table create working after changing 128 to 256 in
 /metastore/src/model/package.jdo.   Does anyone know if there are any
 adverse side-effects of doing so?

 Dilip




-- 
Yours,
Zheng
http://www.linkedin.com/in/zshao