I filed this JIRA issue.
I was wondering, can someone take the time to update the wiki to
describe the complete architecture of the system as outlined here?:
http://wiki.apache.org/hadoop/Hive/DeveloperGuide
I think that outside people really need this to understand where
certain problems might be occurring. To me it seems almost
impenetrable to try and open up these source files and figure out how
everything is linked together so I can't even begin to write my own
patches or consider solutions. Since certain issues (like this JIRA)
are fully blocking further development of some things, it would be
nice to distribute the knowledge so that everyone can effectively
contribute.
Josh Ferguson
Begin forwarded message:
From: "Josh Ferguson (JIRA)" <[EMAIL PROTECTED]>
Date: December 2, 2008 8:38:44 PM PST
To: [EMAIL PROTECTED]
Subject: [jira] Created: (HIVE-106) Join operation fails for some
queries
Join operation fails for some queries
-------------------------------------
Key: HIVE-106
URL: https://issues.apache.org/jira/browse/HIVE-106
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.19.0
Reporter: Josh Ferguson
The Tables Are
CREATE TABLE activities
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,
hour INT)
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;
Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime:
1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:
[FieldSchema(name:actor_id,type:string,comment:null),
FieldSchema(name:actee_id,type:string,comment:null),
FieldSchema
(name:properties,type:map<string,string>,comment:null)],location:/
user/hive/warehouse/
activities
,inputFormat:org
.apache
.hadoop
.mapred
.TextInputFormat
,outputFormat:org
.apache
.hadoop
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:
32
,serdeInfo:SerDeInfo
(name:null
,serializationLib:org
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:
{colelction
.delim
=
44
,mapkey
.delim
=
58
,serialization
.format
=
org
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:
[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:
[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
CREATE TABLE users
(id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,
hour INT)
CLUSTERED BY (id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;
Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime:
1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:
[FieldSchema(name:id,type:string,comment:null),
FieldSchema
(name:properties,type:map<string,string>,comment:null)],location:/
user/hive/warehouse/
users
,inputFormat:org
.apache
.hadoop
.mapred
.TextInputFormat
,outputFormat:org
.apache
.hadoop
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:
32
,serdeInfo:SerDeInfo
(name:null
,serializationLib:org
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:
{colelction
.delim
=
44
,mapkey
.delim
=
58
,serialization
.format
=
org
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:
[id],sortCols:[],parameters:{}),partitionKeys:
[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
A working query is
SELECT activities.* FROM activities WHERE activities.dataset='poke'
AND activities.properties['verb'] = 'Dance';
A non working query is
SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users
ON activities.actor_id = users.id WHERE activities.dataset='poke'
AND activities.properties['verb'] = 'Dance';
The Exception Is
java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate
index expression on string
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:
64)
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at
org
.apache
.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
at
org
.apache
.hadoop
.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:
257)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
477)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
467)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
467)
at
org
.apache
.hadoop
.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:
489)
at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:
140)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
This is thrown every time in the first phase of reduction.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.