These tables were already defined with those
CREATE TABLE activities
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,
hour INT)
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;
Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime:
1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:
[FieldSchema(name:actor_id,type:string,comment:null),
FieldSchema(name:actee_id,type:string,comment:null),
FieldSchema
(name:properties,type:map<string,string>,comment:null)],location:/user/
hive/warehouse/
activities
,inputFormat:org
.apache
.hadoop
.mapred
.TextInputFormat
,outputFormat:org
.apache
.hadoop
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:
32
,serdeInfo:SerDeInfo
(name:null
,serializationLib:org
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:
{colelction
.delim
=
44
,mapkey
.delim
=
58
,serialization
.format
=
org
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:
[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:
[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
CREATE TABLE users
(id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,
hour INT)
CLUSTERED BY (id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;
Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime:
1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:
[FieldSchema(name:id,type:string,comment:null),
FieldSchema
(name:properties,type:map<string,string>,comment:null)],location:/user/
hive/warehouse/
users
,inputFormat:org
.apache
.hadoop
.mapred
.TextInputFormat
,outputFormat:org
.apache
.hadoop
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:
32
,serdeInfo:SerDeInfo
(name:null
,serializationLib:org
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:
{colelction
.delim
=
44
,mapkey
.delim
=
58
,serialization
.format
=
org
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:
[id],sortCols:[],parameters:{}),partitionKeys:
[FieldSchema(name:account,type:string,comment:null),
FieldSchema(name:application,type:string,comment:null),
FieldSchema(name:dataset,type:string,comment:null),
FieldSchema(name:hour,type:int,comment:null)],parameters:{})
The Query Was:
SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users
ON activities.actor_id = users.id WHERE activities.dataset='poke'
AND activities.properties['verb'] = 'Dance';
Josh Ferguson
On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:
Hi Josh,
This is a known problem.
If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to
specify the ITEM terminator and KEY terminator. Please see the
complete CREATE table command syntax.
Let me know if it works or not.
Zheng
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2008 1:36 AM
To: [email protected]
Subject: Index Expressions
I'm getting a bunch of errors during my reduce step while doing a
join between two string fields for some reason.
java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate
index expression on string
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:
64)
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at
org
.apache
.hadoop
.hive
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
at
org
.apache
.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
at
org
.apache
.hadoop
.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:
257)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
477)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
467)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:
467)
at
org
.apache
.hadoop
.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
at
org
.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:
489)
at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:
140)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
430)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
The Query is
SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users
ON activities.actor_id = users.id WHERE activities.dataset='poke'
AND activities.properties['verb'] = 'Dance';
hive> DESCRIBE activities;
OK
actor_id string
actee_id string
properties map<string,string>
account string
application string
dataset string
hour int
hive> DESCRIBE users;
OK
id string
properties map<string,string>
account string
application string
dataset string
hour int
Thanks
Josh