Can you tell the exact command to generate the data ?
Detailed instructions.
Do I need to install pig, or is the standalone perl file good enough ?
Thanks,
-namit
[nj...@dev029 ~/pigmix]$ perl generate_data.pl
Usage: generate_data.pl filetype numrows tablename [nosql]
Valid filetypes [studenttab, studentcolon, studentusrdef,
votertab, reg1459894, textdoc, unicode]
Died at generate_data.pl line 145.
[nj...@dev029 ~/pigmix]$
From: Rakesh Setty [mailto:[email protected]]
Sent: Tuesday, July 07, 2009 1:23 PM
To: Namit Jain
Subject: RE: distinct with union all
Hi Namit,
The query that I am working on is test L11 in PigMix converted to
Hive. To generate the data, you can use the patch attached to
https://issues.apache.org/jira/browse/PIG-200.
The exact Hive query I am using is
insert overwrite table L11out
select distinct username from
(select field1 as username from widerow w
union all select user as username from page_views pv) ur;
The output says 1642777 rows are loaded. If I do
select count(distinct user) from L11out
I get 652384 as the output.
Please let me know if you have any other questions.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Tuesday, July 07, 2009 12:01 PM
To: [email protected]
Subject: RE: distinct with union all
Rakesh,
Did the queries work for you ?
Thanks,
-namit
From: Namit Jain [mailto:[email protected]]
Sent: Monday, July 06, 2009 2:22 PM
To: [email protected]
Subject: RE: distinct with union all
Similar queries seem to be working for me.
1. Which version of hive are using ?
2. Would it be possible for you to ship the data to us ?
Thanks,
-namit
From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:36 PM
To: [email protected]
Subject: RE: distinct with union all
Following are the counts
select count(distinct user) from Tb1;
976272
select count(1) from Tb1;
1642777
select count(distinct user) from Tb1Debug;
653824
select count(1) from Tb1Debug;
1095933
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:29 PM
To: [email protected]
Subject: RE: distinct with union all
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 1:20 PM
To: [email protected]
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too
>big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as
username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user
as username from page_views pv union all select name as username from users u)
ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first
query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which
Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Monday, July 06, 2009 10:03 AM
To: [email protected]
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv
union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:[email protected]]
Sent: Monday, July 06, 2009 9:45 AM
To: [email protected]
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is
happening as the count column suggests. Perhaps it is not happening across the
tables?
I see characters like \, [, _, ^, etc in username. Are these
considered as special characters? If so, is there any workaround to deal with
this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:[email protected]]
Sent: Thursday, July 02, 2009 5:53 PM
To: [email protected]
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters)
for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:[email protected]]
Sent: Thursday, July 02, 2009 1:25 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:[email protected]]
Sent: Thursday, July 02, 2009 12:37 PM
To: [email protected]<mailto:[email protected]>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union
all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh