Just a follow-up here -- when I upgraded to Hive 0.5 everything worked... Thanks again for the help.
On Fri, Mar 5, 2010 at 5:04 AM, Zheng Shao <[email protected]> wrote: > Do you want to try hive release 0.5.0 or hive trunk? > We should have provided better error messages here: > https://issues.apache.org/jira/browse/HIVE-1216 > > Zheng > > On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols <[email protected]> wrote: >> I am trying out Hive, using Cloudera's EC2 distribution (Hadoop >> 0.18.3, Hive 0.4.1, I believe) >> >> I'm trying to run the following query which causes every map task to >> fail with an NPE before making any progress: >> >> java.lang.NullPointerException >> at >> org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:205) >> at >> org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:182) >> at >> org.apache.hadoop.hive.serde2.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:141) >> at >> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:53) >> at >> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:74) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332) >> at >> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:49) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:175) >> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >> >> >> The query: >> -- Get the node's max price and corresponding year/day/hour/month >> select isone.node_id, isone.day, isone.hour, isone.lmp >> from (select max(lmp) as mlmp, node_id >> from isone_lmp >> where isone_lmp.node_id = 400 >> group by node_id) maxlmp >> join isone_lmp isone on ( isone.node_id = maxlmp.node_id >> and isone.lmp=maxlmp.mlmp ); >> >> The table: >> CREATE TABLE isone_lmp ( >> node_id int, >> day string, >> hour int, >> minute int, >> energy float, >> congestion float, >> loss float, >> lmp float >> ) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY ',' >> STORED AS TEXTFILE; >> >> The data looks like the following: >> 396,20090120,00,00,62.77,0,.78,63.55 >> 397,20090120,00,00,62.77,0,.65,63.42 >> 398,20090120,00,00,62.77,0,.65,63.42 >> 399,20090120,00,00,62.77,0,.65,63.42 >> 400,20090120,00,00,62.77,0,.65,63.42 >> 401,20090120,00,00,62.77,0,-1.02,61.75 >> 405,20090120,00,00,62.77,0,.21,62.98 >> >> It's about 15GB of data total; I can do a simple "select count(1) from >> isone_lmp;" which executes as expected. Any thoughts? I've been able >> to execute the same query on a smaller subset of data (2M rows as >> opposed to 500M) on a non-distributed setup locally. >> >> Thanks. >> -Tom >> > > > > -- > Yours, > Zheng >
