Hi: I think the query time about multiple join part is not related with the number of limit operator(in your case the number is 4). When the query is executed, limit_data is executed after Bad_OrderRes, after join (Bad_OrderRes) is finished, limit(limit_data) starts. If I have missed something, please tell me.
Best Regards Kelly Zhang/Zhang,Liyun -----Original Message----- From: mingda li [mailto:[email protected]] Sent: Wednesday, December 7, 2016 8:18 AM To: [email protected]; [email protected] Subject: How to test the efficiency of multiple join Dear all, I want to test the different multiple join orders' efficiency. However, since the pig query is executed lazily, I need to use dump or store to let the query be executed. Now, I use the following query to test the efficiency. *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY cs_item_sk;* *Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT Bad_OrderRes 4; * *Dump limit_data;* Do you think this is OK to just show 4 of results? Could this query execution time represent the efficiency of multilpe join? I am not sure if it will just get 4 items and stop without executing other items. Bests, Mingda
