Do you want to try hive release 0.5.0 or hive trunk?
We should have provided better error messages here:
https://issues.apache.org/jira/browse/HIVE-1216

Zheng

On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols <tmnich...@gmail.com> wrote:
> I am trying out Hive, using Cloudera's EC2 distribution (Hadoop
> 0.18.3, Hive 0.4.1, I believe)
>
> I'm trying to run the following query which causes every map task to
> fail with an NPE before making any progress:
>
> java.lang.NullPointerException
>        at 
> org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:205)
>        at 
> org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:182)
>        at 
> org.apache.hadoop.hive.serde2.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:141)
>        at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:53)
>        at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:74)
>        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
>        at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:49)
>        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
>        at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:175)
>        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>
>
> The query:
> -- Get the node's max price and corresponding year/day/hour/month
> select isone.node_id, isone.day, isone.hour, isone.lmp
> from (select max(lmp) as mlmp, node_id
>    from isone_lmp
>    where isone_lmp.node_id = 400
>    group by node_id) maxlmp
> join isone_lmp isone on ( isone.node_id = maxlmp.node_id
>  and isone.lmp=maxlmp.mlmp );
>
> The table:
> CREATE TABLE isone_lmp (
>  node_id int,
>  day string,
>  hour int,
>  minute int,
>  energy float,
>  congestion float,
>  loss float,
>  lmp float
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;
>
> The data looks like the following:
> 396,20090120,00,00,62.77,0,.78,63.55
> 397,20090120,00,00,62.77,0,.65,63.42
> 398,20090120,00,00,62.77,0,.65,63.42
> 399,20090120,00,00,62.77,0,.65,63.42
> 400,20090120,00,00,62.77,0,.65,63.42
> 401,20090120,00,00,62.77,0,-1.02,61.75
> 405,20090120,00,00,62.77,0,.21,62.98
>
> It's about 15GB of data total; I can do a simple "select count(1) from
> isone_lmp;" which executes as expected.  Any thoughts?  I've been able
> to execute the same query on a smaller subset of data (2M rows as
> opposed to 500M) on a non-distributed setup locally.
>
> Thanks.
> -Tom
>



-- 
Yours,
Zheng

Reply via email to