On Thu, Feb 4, 2010 at 12:41 PM, Zheng Shao <[email protected]> wrote:
> Can you post the results of "explain" for all 3 queries?
>
>
> Zheng
>
> On Thu, Feb 4, 2010 at 8:41 AM, Edward Capriolo <[email protected]> wrote:
>> OK
>> 55504011
>> Time taken: 290.216 seconds
>> hive> select count(1) from pageviews;
>>
>> select count(1) from files f;
>> Ended Job = job_200909171715_20347
>> OK
>> 10164516
>> Time taken: 29.946 seconds
>>
>> select count(1) from files f join pageviews p on f.id = p.file_id
>>
>> OK
>> 89375203
>> Time taken: 164.767 seconds
>>
>> Any hint on what is going wrong here? from our dataset each pageview
>> should be related to 1 or 0 files?
>>
>> Thanks,
>> Edward
>>
>
>
>
> --
> Yours,
> Zheng
>

Zheng,

My mistake. I made some incorrect assumptions about my source data. We
should add referential integrity to prevent me from making this
mistake again. NOT!

Thanks again,
Edward

Reply via email to