Not only can I verify that the rows are present, I broke down and coded the
join in Java MapReduce by hand. Works.
- Aaron

On Thu, Jan 22, 2009 at 11:42 PM, Zheng Shao <zsh...@gmail.com> wrote:

> This is unexpected. We did try similar cases before (with multiple join
> keys).
>
> Can you verify by finding out the actual rows in each table that will
> produce a join result row?
>
> Zheng
>
>
> On Thu, Jan 22, 2009 at 6:09 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
>> Hi Zheng,
>>
>> I was eliding some information to try to boil things down a bit -- but
>> here goes.
>>
>>
>> To expand the white lie into the fuller debug dump:
>> table A ~ ak_filtered_ips
>> table B ~ ip_locations
>>
>> It's actually three fields ('a', 'b', and 'c') that I'm joining on, not
>> two.
>>
>> hive> describe extended ak_filtered_ips;
>> OK
>> a    int
>> b    int
>> c    int
>> Detailed Table Information:
>> Table(tableName:ak_filtered_ips,dbName:default,owner:aaron,createTime:1232661581,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null),
>> FieldSchema(name:b,type:int,comment:null),
>> FieldSchema(name:c,type:int,comment:null)],location:hdfs://(server name
>> redacted):9000/user/hive/warehouse/ak_filtered_ips,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{})
>>
>> hive> describe extended ip_locations;
>> OK
>> a    int
>> b    int
>> c    int
>> cityid    int
>> countryid    int
>> Detailed Table Information:
>> Table(tableName:ip_locations,dbName:default,owner:aaron,createTime:1232596116,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:a,type:int,comment:null),
>> FieldSchema(name:b,type:int,comment:null),
>> FieldSchema(name:c,type:int,comment:null),
>> FieldSchema(name:cityid,type:int,comment:null),
>> FieldSchema(name:countryid,type:int,comment:null)],location:hdfs://(server
>> name
>> redacted):9000/user/hive/warehouse/ip_locations,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=1}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{})
>>
>> I actually just loaded in a super-small test case involving A, B, x, y,
>> and z as I initially described. And that statement *does* work.
>>
>> The actual statement I was running into problems with is:
>>
>> SELECT ip_locations.cityid, ip_locations.countryid FROM
>> ak_filtered_ips JOIN ip_locations ON (ak_filtered_ips.a = ip_locations.a
>> AND ak_filtered_ips.b = ip_locations.b AND ak_filtered_ips.c =
>> ip_locations.c)
>>
>> I would eventually like to add a "count(1)" to the field list, and a
>> "GROUP BY ip_locations.cityid, ip_locations.countryid" at the end.
>>
>> This SELECT statement returns no rows.
>>
>> Thanks,
>> - Aaron
>>
>>
>> On Thu, Jan 22, 2009 at 4:27 PM, Zheng Shao <zsh...@gmail.com> wrote:
>>
>>> Can you do "describe extended a;" and "describe extended b;" and pste
>>> the result here?
>>>
>>> Zheng
>>>
>>>
>>> On 1/22/09, Aaron Kimball <aa...@cloudera.com> wrote:
>>> > Hi all,
>>> >
>>> > I have a perplexing problem with joins in Hive. I have two tables "A"
>>> and
>>> > "B".  A contains two columns x and y. B also contains x and y, and also
>>> z. I
>>> > want to select the values of z from B, for every (x, y) I see in table
>>> A.
>>> >
>>> > So I wrote this statement:
>>> >
>>> > SELECT b.z FROM a JOIN b ON (a.x = b.x AND a.y = b.y);
>>> >
>>> > This crunches away and then comes back with an empty resultset. I have
>>> > exported the same data into MySQL and ran the same query; it works just
>>> fine
>>> > there. Does Hive support multi-column join criteria?
>>> >
>>> > Thanks,
>>> > - Aaron Kimball
>>> >
>>>
>>> --
>>> Sent from Gmail for mobile | mobile.google.com
>>>
>>> Yours,
>>> Zheng
>>>
>>
>>
>
>
> --
> Yours,
> Zheng
>

Reply via email to