RE: equijoin with multiple columns?

Ashish Thusoo Fri, 23 Jan 2009 03:49:58 -0800

Aaron

can you send the plan for this query..


explain extended <your query>

This is very unexpected..

Thanks,
Ashish
________________________________________
From: Aaron Kimball [aa...@cloudera.com]
Sent: Friday, January 23, 2009 1:24 AM
To: hive-user@hadoop.apache.org
Subject: Re: equijoin with multiple columns?

Not only can I verify that the rows are present, I broke down and coded the 
join in Java MapReduce by hand. Works.
- Aaron

On Thu, Jan 22, 2009 at 11:42 PM, Zheng Shao 
<zsh...@gmail.com<mailto:zsh...@gmail.com>> wrote:
This is unexpected. We did try similar cases before (with multiple join keys).

Can you verify by finding out the actual rows in each table that will produce a 
join result row?

Zheng


On Thu, Jan 22, 2009 at 6:09 PM, Aaron Kimball 
<aa...@cloudera.com<mailto:aa...@cloudera.com>> wrote:
Hi Zheng,

I was eliding some information to try to boil things down a bit -- but here 
goes.


To expand the white lie into the fuller debug dump:
table A ~ ak_filtered_ips
table B ~ ip_locations

It's actually three fields ('a', 'b', and 'c') that I'm joining on, not two.

hive> describe extended ak_filtered_ips;
OK
a    int
b    int
c    int
Detailed Table Information:
Table(tableName:ak_filtered_
ips,dbName:default,owner:aaron,createTime:1232661581,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null),
 FieldSchema(name:b,type:int,comment:null), 
FieldSchema(name:c,type:int,comment:null)],location:hdfs://(server name 
redacted):9000/user/hive/warehouse/ak_filtered_ips,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{})

hive> describe extended ip_locations;
OK
a    int
b    int
c    int
cityid    int
countryid    int
Detailed Table Information:
Table(tableName:ip_locations,dbName:default,owner:aaron,createTime:1232596116,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null),
 FieldSchema(name:b,type:int,comment:null), 
FieldSchema(name:c,type:int,comment:null), 
FieldSchema(name:cityid,type:int,comment:null), 
FieldSchema(name:countryid,type:int,comment:null)],location:hdfs://(server name 
redacted):9000/user/hive/warehouse/ip_locations,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{})

I actually just loaded in a super-small test case involving A, B, x, y, and z 
as I initially described. And that statement *does* work.

The actual statement I was running into problems with is:

SELECT ip_locations.cityid, ip_locations.countryid FROM
ak_filtered_ips JOIN ip_locations ON (ak_filtered_ips.a = ip_locations.a AND 
ak_filtered_ips.b = ip_locations.b AND ak_filtered_ips.c = ip_locations.c)

I would eventually like to add a "count(1)" to the field list, and a "GROUP BY 
ip_locations.cityid, ip_locations.countryid" at the end.

This SELECT statement returns no rows.

Thanks,
- Aaron


On Thu, Jan 22, 2009 at 4:27 PM, Zheng Shao 
<zsh...@gmail.com<mailto:zsh...@gmail.com>> wrote:
Can you do "describe extended a;" and "describe extended b;" and pste
the result here?

Zheng


On 1/22/09, Aaron Kimball <aa...@cloudera.com<mailto:aa...@cloudera.com>> wrote:
> Hi all,
>
> I have a perplexing problem with joins in Hive. I have two tables "A" and
> "B".  A contains two columns x and y. B also contains x and y, and also z. I
> want to select the values of z from B, for every (x, y) I see in table A.
>
> So I wrote this statement:
>
> SELECT b.z FROM a JOIN b ON (a.x = b.x AND a.y = b.y);
>
> This crunches away and then comes back with an empty resultset. I have
> exported the same data into MySQL and ran the same query; it works just fine
> there. Does Hive support multi-column join criteria?
>
> Thanks,
> - Aaron Kimball
>

--
Sent from Gmail for mobile | mobile.google.com<http://mobile.google.com>

Yours,
Zheng




--
Yours,
Zheng

RE: equijoin with multiple columns?

Reply via email to