Fair enough. I will get some architecture docs in place.

Ashish

________________________________
From: Jeff Hammerbacher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 03, 2008 12:43 AM
To: [email protected]
Subject: Re: [jira] Created: (HIVE-106) Join operation fails for some queries

+1. Even better would be to have this code architecture documented using 
Forrest (really, it's just writing HTML) and distributed with Hive.

On Tue, Dec 2, 2008 at 11:33 PM, Josh Ferguson <[EMAIL PROTECTED]<mailto:[EMAIL 
PROTECTED]>> wrote:
I filed this JIRA issue.

I was wondering, can someone take the time to update the wiki to describe the 
complete architecture of the system as outlined here?:

http://wiki.apache.org/hadoop/Hive/DeveloperGuide

I think that outside people really need this to understand where certain 
problems might be occurring. To me it seems almost impenetrable to try and open 
up these source files and figure out how everything is linked together so I 
can't even begin to write my own patches or consider solutions. Since certain 
issues (like this JIRA) are fully blocking further development of some things, 
it would be nice to distribute the knowledge so that everyone can effectively 
contribute.

Josh Ferguson

Begin forwarded message:

From: "Josh Ferguson (JIRA)" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: December 2, 2008 8:38:44 PM PST
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: [jira] Created: (HIVE-106) Join operation fails for some queries

Join operation fails for some queries
-------------------------------------

                Key: HIVE-106
                URL: https://issues.apache.org/jira/browse/HIVE-106
            Project: Hadoop Hive
         Issue Type: Bug
         Components: Query Processor
   Affects Versions: 0.19.0
           Reporter: Josh Ferguson


The Tables Are

CREATE TABLE activities
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT)
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
 FieldSchema(name:actee_id,type:string,comment:null), 
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
 
actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
 FieldSchema(name:application,type:string,comment:null), 
FieldSchema(name:dataset,type:string,comment:null), 
FieldSchema(name:hour,type:int,comment:null)],parameters:{})


CREATE TABLE users
(id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT)
CLUSTERED BY (id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
 
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
 FieldSchema(name:application,type:string,comment:null), 
FieldSchema(name:dataset,type:string,comment:null), 
FieldSchema(name:hour,type:int,comment:null)],parameters:{})

A working query is

SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
activities.properties['verb'] = 'Dance';

A non working query is

SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
activities.actor_id = users.id<http://users.id> WHERE activities.dataset='poke' 
AND activities.properties['verb'] = 'Dance';

The Exception Is

java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
expression on string
at 
org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

This is thrown every time in the first phase of reduction.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Reply via email to