In Pig Latin data flows are more linear than SQL, so SQL subqueries tend to come first in Pig Latin scripts. Given that, your first SQL query would look roughly like:

-- Find the records from s that will form the 'in' clause
A = load 'supplier' as (<whatever it's schema is>); --
B = filter A by s_comment matches '.*Customer.*Complaints.*';
C = foreach B generate s_suppkey; -- project out everything but the field you're interested in

D = load 'partsupp' as (<schema>);
E = load 'part' as (<schema>);
F = join D by ps_partkey, E by p_partkey; -- do the join
G = filter F by p_brand != '[BRAND]'and p_type not matches '[TYPE].*'and p_size == '[SIZE1]' or p_size == '[SIZE2]' ... -- do the big harry filter, pretty much the same except Pig doesn't support the IN syntax

-- Rewrite the subquery as an anti-join
H = cogroup G by ps_suppkey, C by s_suppkey);
I = filter H by COUNT(C) == 0;
J = foreach I generate flatten(G);

-- Now do the group by
K = group J by _brand,p_type,p_sizeorder bysupplier_cnt desc,p_brand,p_type,p_size;
L = foreach K {
        L1 = J.ps_suppkey;
        L2 = distinct L1;
        generate group, COUNT(L2);
}

I think that will more or less do it.

Alan.

On Mar 27, 2011, at 4:48 AM, hiba houimli wrote:


Hi,
I like to know how I can transform a query sql to a script pig especially when I have a query like that : selectp_brand,p_type,p_size,count(distinct ps_suppkey) as supplier_cntfrompartsupp,partwherep_partkey = ps_partkeyand p_brand <> '[BRAND]'and p_type not like '[TYPE]%'and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])and ps_suppkey not in (selects_suppkeyfromsupplierwheres_comment like '%Customer%Complaints%')group byp_brand,p_type,p_sizeorder bysupplier_cnt desc,p_brand,p_type,p_size;
or like that:
selectc_count, count(*) as custdistfrom (selectc_custkey,count(o_orderkey)fromcustomer left outer join orders onc_custkey = o_custkeyand o_comment not like ‘%[WORD1]% [WORD2]%’group byc_custkey)as c_orders (c_custkey, c_count)group byc_countorder bycustdist desc,c_count desc;

and I like to know how I transform the clause exists, in or not in to a pig operator and thank you ?

                                        

Reply via email to