Hi:
   I think the query time about multiple join part is not related with the 
number of limit operator(in your case the number is 4). When the query is 
executed, limit_data is executed after Bad_OrderRes, after join (Bad_OrderRes) 
is finished, limit(limit_data) starts.
If I have missed something, please tell me.


Best Regards
Kelly Zhang/Zhang,Liyun



-----Original Message-----
From: mingda li [mailto:[email protected]] 
Sent: Wednesday, December 7, 2016 8:18 AM
To: [email protected]; [email protected]
Subject: How to test the efficiency of multiple join

Dear all,

I want to test the different multiple join orders' efficiency. However, since 
the pig query is executed lazily, I need to use dump or store to let the query 
be executed.

Now, I use the following query to test the efficiency.

*Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales BY cs_item_sk;*
*Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk, cs_order_number),
catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT 
Bad_OrderRes 4; * *Dump limit_data;*

Do you think this is OK to just show 4 of results? Could this query execution 
time represent the efficiency of multilpe join? I am not sure if it will just 
get 4 items and stop without executing other items.

Bests,
Mingda

Reply via email to