+1. Even better would be to have this code architecture documented using Forrest (really, it's just writing HTML) and distributed with Hive.
On Tue, Dec 2, 2008 at 11:33 PM, Josh Ferguson <[EMAIL PROTECTED]> wrote: > I filed this JIRA issue. > I was wondering, can someone take the time to update the wiki to describe > the complete architecture of the system as outlined here?: > > http://wiki.apache.org/hadoop/Hive/DeveloperGuide > > I think that outside people really need this to understand where certain > problems might be occurring. To me it seems almost impenetrable to try and > open up these source files and figure out how everything is linked together > so I can't even begin to write my own patches or consider solutions. Since > certain issues (like this JIRA) are fully blocking further development of > some things, it would be nice to distribute the knowledge so that everyone > can effectively contribute. > > Josh Ferguson > > Begin forwarded message: > > *From: *"Josh Ferguson (JIRA)" <[EMAIL PROTECTED]> > *Date: *December 2, 2008 8:38:44 PM PST > *To: [EMAIL PROTECTED] > *Subject: **[jira] Created: (HIVE-106) Join operation fails for some > queries* > > Join operation fails for some queries > ------------------------------------- > > Key: HIVE-106 > URL: https://issues.apache.org/jira/browse/HIVE-106 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.19.0 > Reporter: Josh Ferguson > > > The Tables Are > > CREATE TABLE activities > (actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>) > PARTITIONED BY (account STRING, application STRING, dataset STRING, hour > INT) > CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS > ROW FORMAT DELIMITED > COLLECTION ITEMS TERMINATED BY '44' > MAP KEYS TERMINATED BY '58' > STORED AS TEXTFILE; > > Detailed Table Information: > Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null), > FieldSchema(name:actee_id,type:string,comment:null), > FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id, > actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), > FieldSchema(name:application,type:string,comment:null), > FieldSchema(name:dataset,type:string,comment:null), > FieldSchema(name:hour,type:int,comment:null)],parameters:{}) > > > CREATE TABLE users > (id STRING, properties MAP<STRING, STRING>) > PARTITIONED BY (account STRING, application STRING, dataset STRING, hour > INT) > CLUSTERED BY (id) INTO 32 BUCKETS > ROW FORMAT DELIMITED > COLLECTION ITEMS TERMINATED BY '44' > MAP KEYS TERMINATED BY '58' > STORED AS TEXTFILE; > > Detailed Table Information: > Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null), > FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), > FieldSchema(name:application,type:string,comment:null), > FieldSchema(name:dataset,type:string,comment:null), > FieldSchema(name:hour,type:int,comment:null)],parameters:{}) > > A working query is > > SELECT activities.* FROM activities WHERE activities.dataset='poke' AND > activities.properties['verb'] = 'Dance'; > > A non working query is > > SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON > activities.actor_id = users.id WHERE activities.dataset='poke' AND > activities.properties['verb'] = 'Dance'; > > The Exception Is > > java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index > expression on string > at > org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64) > at > org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) > at > org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489) > at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430) > at org.apache.hadoop.mapred.Child.main(Child.java:155) > > This is thrown every time in the first phase of reduction. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > >
