Hi Zheng, I was eliding some information to try to boil things down a bit -- but here goes.
To expand the white lie into the fuller debug dump: table A ~ ak_filtered_ips table B ~ ip_locations It's actually three fields ('a', 'b', and 'c') that I'm joining on, not two. hive> describe extended ak_filtered_ips; OK a int b int c int Detailed Table Information: Table(tableName:ak_filtered_ips,dbName:default,owner:aaron,createTime:1232661581,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null), FieldSchema(name:b,type:int,comment:null), FieldSchema(name:c,type:int,comment:null)],location:hdfs://(server name redacted):9000/user/hive/warehouse/ak_filtered_ips,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{}) hive> describe extended ip_locations; OK a int b int c int cityid int countryid int Detailed Table Information: Table(tableName:ip_locations,dbName:default,owner:aaron,createTime:1232596116,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null), FieldSchema(name:b,type:int,comment:null), FieldSchema(name:c,type:int,comment:null), FieldSchema(name:cityid,type:int,comment:null), FieldSchema(name:countryid,type:int,comment:null)],location:hdfs://(server name redacted):9000/user/hive/warehouse/ip_locations,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{}) I actually just loaded in a super-small test case involving A, B, x, y, and z as I initially described. And that statement *does* work. The actual statement I was running into problems with is: SELECT ip_locations.cityid, ip_locations.countryid FROM ak_filtered_ips JOIN ip_locations ON (ak_filtered_ips.a = ip_locations.a AND ak_filtered_ips.b = ip_locations.b AND ak_filtered_ips.c = ip_locations.c) I would eventually like to add a "count(1)" to the field list, and a "GROUP BY ip_locations.cityid, ip_locations.countryid" at the end. This SELECT statement returns no rows. Thanks, - Aaron On Thu, Jan 22, 2009 at 4:27 PM, Zheng Shao <zsh...@gmail.com> wrote: > Can you do "describe extended a;" and "describe extended b;" and pste > the result here? > > Zheng > > > On 1/22/09, Aaron Kimball <aa...@cloudera.com> wrote: > > Hi all, > > > > I have a perplexing problem with joins in Hive. I have two tables "A" and > > "B". A contains two columns x and y. B also contains x and y, and also > z. I > > want to select the values of z from B, for every (x, y) I see in table A. > > > > So I wrote this statement: > > > > SELECT b.z FROM a JOIN b ON (a.x = b.x AND a.y = b.y); > > > > This crunches away and then comes back with an empty resultset. I have > > exported the same data into MySQL and ran the same query; it works just > fine > > there. Does Hive support multi-column join criteria? > > > > Thanks, > > - Aaron Kimball > > > > -- > Sent from Gmail for mobile | mobile.google.com > > Yours, > Zheng >