[ https://issues.apache.org/jira/browse/PIG-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169803#comment-13169803 ]
Dmitriy V. Ryaboy commented on PIG-2397: ---------------------------------------- Jie, something just occurred to me about Q1 -- are you sure Hive is doing the right thing here? If it's using more than 1 reducer for the ORDER operation, it's not: {code} Syntax of Order By The ORDER BY syntax in Hive QL is similar to the syntax of ORDER BY in SQL language. colOrder: ( ASC | DESC ) orderBy: ORDER BY colName colOrder? (',' colName colOrder?)* query: SELECT expression (',' expression)* FROM src orderBy There are some limitations in the "order by" clause. In the strict mode (i.e., hive.mapred.mode=strict), the order by clause has to be followed by a "limit" clause. The limit clause is not necessary if you set hive.mapred.mode to nonstrict. The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output. If the number of rows in the output is too large, the single reducer could take a very long time to finish. {code} > Running TPC-H on Pig > -------------------- > > Key: PIG-2397 > URL: https://issues.apache.org/jira/browse/PIG-2397 > Project: Pig > Issue Type: Task > Reporter: Jie Li > Attachments: TPC-H_on_Pig.tgz, pig_tpch.ppt > > > For a class project we developed a whole set of Pig scripts for TPC-H. Our > goals are: > 1) identifying the bottlenecks of Pig's performance especially of its > relational operators, > 2) studying how to write efficient scripts by making full use of Pig Latin's > features, > 3) comparing with Hive's TPC-H results for verifying both 1) and 2). > We will update the JIRA with our scripts, results and analysis soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira