Similar queries seem to be working for me.

1.       Which version of hive are using ?

2.       Would it be possible for you to ship the data to us ?


Thanks,
-namit

From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:36 PM
To: [email protected]
Subject: RE: distinct with union all

Following are the counts

select count(distinct user) from Tb1;
976272

select count(1) from Tb1;
1642777

select count(distinct user) from Tb1Debug;
653824

select count(1) from Tb1Debug;
1095933

Thanks,
Rakesh

________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:29 PM
To: [email protected]
Subject: RE: distinct with union all

Can you send the following counts:

select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;



From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:20 PM
To: [email protected]
Subject: RE: distinct with union all

Hi,

>From the counts, I could see that grouping was happening. The output is too 
>big, so I am attaching just partial output here

AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb       1
AABIxQR_Un]DLGXNwla\`o  3
AAB[_pudLZeuOe  7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin   3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM     1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV  3
AAHs]A_gvZxfKKCI^\      2
AAIAGwnFPsrGFxaHYgOAiCHe        2
AAIjiO]r`rkrobp_xRn     1
AAIwuSGOmu_L`YGCoHrmvub`        1
AAJG[RPx\RvEYljsBgDdR   3
AAJ^BFCNpTXt]wOEir[     1
AAJvlnBfYXNnXulGOHlAeW  2
AAKwheCincrxm_jkbm      2

The number of rows written is varying between the queries

insert overwrite table Tbl1 select distinct username from (select user as 
username from page_views pv union all select name as username from users u) ur;

and

insert overwrite table Tbl1Debug select  username, count(1) from (select user 
as username from page_views pv union all select name as username from users u) 
ur group by username;

The query

select count(distinct user) from Tb1;

returns different value compared to the number of rows created by the first 
query. It is also different from the query

select count(distinct user) from Tb1Debug;

@Ashish - I had attached the explain plan output in an earlier reply which 
Namit said looks fine.

Please let me know what could be the problem.

Thanks,
Rakesh


________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Monday, July 06, 2009 10:03 AM
To: [email protected]
Subject: RE: distinct with union all

Nothing special for these characters.

select  username, count(1) from (select user as username from page_views pv 
union all select name as username from users u) ur group by username;

should return the counts.

When you see the duplicate usernames, can you send them -


From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 9:45 AM
To: [email protected]
Subject: RE: distinct with union all

Hi Amr,

            The group by clause that you suggest shows that grouping is 
happening as the count column suggests. Perhaps it is not happening across the 
tables?
            I see characters like \, [, _, ^, etc in username. Are these 
considered as special characters? If so, is there any workaround to deal with 
this?

Thanks,
Rakesh

________________________________
From: Amr Awadallah [mailto:[email protected]]
Sent: Thursday, July 02, 2009 5:53 PM
To: [email protected]
Subject: Re: distinct with union all


make sure you don't have any leading or trailing spaces (or special characters) 
for the usernames being extracted.

also to debug, try to do a select username, count(1) then group by username

-- amr

Rakesh Setty wrote:
Yes, I am getting duplicate usernames.


________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Thursday, July 02, 2009 1:25 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: distinct with union all

Are you getting duplicate usernames ?


From: Rakesh Setty [mailto:[email protected]]
Sent: Thursday, July 02, 2009 12:37 PM
To: [email protected]<mailto:[email protected]>
Subject: distinct with union all

Hi,

I have a query like

select distinct username from (select user as username from page_views pv union 
all select name as username from users u) ur;

But I see that result is not actually distinct. Am I missing something here?

Thanks,
Rakesh

Reply via email to