[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627063#comment-13627063 ]
Gunther Hagleitner commented on HIVE-4318: ------------------------------------------ Nice find! Thanks! I can see how that will be problematic in the inner loop. Just glancing at it: - the only place where we use operator-hooks seems to be the home-grown profiler. Neat to have, but not a good reason to slow down performance. - this seems to add the following overhead to the inner loop: - allocation of context per row - iteration through empty list of hooks (utils always sets empty list) - two additional virtual calls Is anyone really using/planning on using this? The easiest fix would be to just remove it, which seems the right thing to do. It doesn't seem useful to have a feature that adds a lot of overhead in the inner loop. There are ways to cut down on the overhead for when there are no hooks, but I'd like to know if there's opposition to removing it first. > OperatorHooks hit performance even when not used > ------------------------------------------------ > > Key: HIVE-4318 > URL: https://issues.apache.org/jira/browse/HIVE-4318 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Ubuntu LXC (64 bit) > Reporter: Gopal V > Assignee: Gunther Hagleitner > > Operator Hooks inserted into Operator.java cause a performance hit even when > it is not being used. > For a count(1) query tested with & without the operator hook calls. > {code:title=with} > 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 84.07 sec > Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec > OK > 28800991 > Time taken: 40.407 seconds, Fetched: 1 row(s) > {code} > {code:title=without} > 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 68.48 sec > ... > Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec > OK > 28800991 > Time taken: 35.907 seconds, Fetched: 1 row(s) > {code} > The effect is multiplied by the number of operators in the pipeline that has > to forward the row - the more operators there are the, the slower the query. > The modification made to test this was > {code:title=Operator.java} > --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java > +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java > @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws > HiveException { > return; > } > OperatorHookContext opHookContext = new OperatorHookContext(this, row, > tag); > - preProcessCounter(); > - enterOperatorHooks(opHookContext); > + //preProcessCounter(); > + //enterOperatorHooks(opHookContext); > processOp(row, tag); > - exitOperatorHooks(opHookContext); > - postProcessCounter(); > + //exitOperatorHooks(opHookContext); > + //postProcessCounter(); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira