[ https://issues.apache.org/jira/browse/HIVE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107018#comment-13107018 ]
jirapos...@reviews.apache.org commented on HIVE-2453: ----------------------------------------------------- bq. On 2011-09-16 21:27:59, Ning Zhang wrote: bq. > trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java, line 42 bq. > <https://reviews.apache.org/r/1933/diff/1/?file=41497#file41497line42> bq. > bq. > can you split it into 2 parts: useScriptInMapper and useScriptInReducer? bq. bq. Kevin Wilfong wrote: bq. Determining whether a script is used in the mapper or the reducer will require going through the operator tree added to each Map Reduce job to determine if a Transform operator is there and then setting the appropriate flag. That is more work than I'd like to do here considering this feature will probably not be used by most users. I would like to keep the flag here, so that it can be decided if that work needs to be performed somewhere else. OK. My original thought of splitting this into mapper and reducer flags is that we can analyze the cost of the script operator based on its input size (mappers and reducers have different input size metrics). Let's see if they are needed in the future and file a followup JIRA then. - Ning ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1933/#review1946 ----------------------------------------------------------- On 2011-09-17 00:14:50, Kevin Wilfong wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1933/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-09-17 00:14:50) bq. bq. bq. Review request for hive and Ning Zhang. bq. bq. bq. Summary bq. ------- bq. bq. The information that would be useful for categorizing queries is clearest in the Semantic Analyzer, when the data from the Parser is interpreted. I added a new class which is designed to collect that data here, and place it ultimately in the QueryPlan where it will be available to hooks. bq. bq. The information I collect is whether or not the query has the following clauses: bq. Join bq. Group By bq. Order By bq. Sort By bq. Group By after a Join clause bq. bq. Also, I store whether or not a script is used for mapping or reducing. bq. bq. bq. This addresses bug HIVE-2453. bq. https://issues.apache.org/jira/browse/HIVE-2453 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 1170719 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java PRE-CREATION bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1170719 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1170719 bq. trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/CheckQueryPropertiesHook.java PRE-CREATION bq. trunk/ql/src/test/queries/clientpositive/query_properties.q PRE-CREATION bq. trunk/ql/src/test/results/clientpositive/query_properties.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/1933/diff bq. bq. bq. Testing bq. ------- bq. bq. I added a new test, which runs a variety of queries, such that each of the flags in QueryProperties is set by at least one query, and also some are set in combinations. bq. I also added a hook which prints the contents of QueryProperties to error on the console. bq. bq. I checked the output in the results file and verified it matched what I expected. bq. bq. bq. Thanks, bq. bq. Kevin bq. bq. > Need a way to categorize queries in hooks for improved logging > -------------------------------------------------------------- > > Key: HIVE-2453 > URL: https://issues.apache.org/jira/browse/HIVE-2453 > Project: Hive > Issue Type: Improvement > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-2453.1.patch.txt > > > We need a way to categorize queries, such as whether or not the include a > join clause, a group by clause, etc., in the hooks. This will allow for > better performance logging. > Currently the only way I can find is to go through the operators in the > tasks, but which operators are used for the different types of queries may > change over time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira