Do you want to try hive release 0.5.0 or hive trunk? We should have provided better error messages here: https://issues.apache.org/jira/browse/HIVE-1216
Zheng On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols <tmnich...@gmail.com> wrote: > I am trying out Hive, using Cloudera's EC2 distribution (Hadoop > 0.18.3, Hive 0.4.1, I believe) > > I'm trying to run the following query which causes every map task to > fail with an NPE before making any progress: > > java.lang.NullPointerException > at > org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:205) > at > org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:182) > at > org.apache.hadoop.hive.serde2.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:141) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:53) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:74) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:49) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:175) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > > > The query: > -- Get the node's max price and corresponding year/day/hour/month > select isone.node_id, isone.day, isone.hour, isone.lmp > from (select max(lmp) as mlmp, node_id > from isone_lmp > where isone_lmp.node_id = 400 > group by node_id) maxlmp > join isone_lmp isone on ( isone.node_id = maxlmp.node_id > and isone.lmp=maxlmp.mlmp ); > > The table: > CREATE TABLE isone_lmp ( > node_id int, > day string, > hour int, > minute int, > energy float, > congestion float, > loss float, > lmp float > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE; > > The data looks like the following: > 396,20090120,00,00,62.77,0,.78,63.55 > 397,20090120,00,00,62.77,0,.65,63.42 > 398,20090120,00,00,62.77,0,.65,63.42 > 399,20090120,00,00,62.77,0,.65,63.42 > 400,20090120,00,00,62.77,0,.65,63.42 > 401,20090120,00,00,62.77,0,-1.02,61.75 > 405,20090120,00,00,62.77,0,.21,62.98 > > It's about 15GB of data total; I can do a simple "select count(1) from > isone_lmp;" which executes as expected. Any thoughts? I've been able > to execute the same query on a smaller subset of data (2M rows as > opposed to 500M) on a non-distributed setup locally. > > Thanks. > -Tom > -- Yours, Zheng