On Thu, Feb 4, 2010 at 12:41 PM, Zheng Shao <[email protected]> wrote: > Can you post the results of "explain" for all 3 queries? > > > Zheng > > On Thu, Feb 4, 2010 at 8:41 AM, Edward Capriolo <[email protected]> wrote: >> OK >> 55504011 >> Time taken: 290.216 seconds >> hive> select count(1) from pageviews; >> >> select count(1) from files f; >> Ended Job = job_200909171715_20347 >> OK >> 10164516 >> Time taken: 29.946 seconds >> >> select count(1) from files f join pageviews p on f.id = p.file_id >> >> OK >> 89375203 >> Time taken: 164.767 seconds >> >> Any hint on what is going wrong here? from our dataset each pageview >> should be related to 1 or 0 files? >> >> Thanks, >> Edward >> > > > > -- > Yours, > Zheng >
Zheng, My mistake. I made some incorrect assumptions about my source data. We should add referential integrity to prevent me from making this mistake again. NOT! Thanks again, Edward
