[ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677548#action_12677548 ]
Namit Jain commented on HIVE-279: --------------------------------- Some high level comments: 1. Add more comments everywhere, specifically in joinPPD (OpProcFactory) 2. Remove operator specific code in ExprWalkerProcFactory: ColumnExprProcessor: process 3. Use specific data-structures where-ever possible instead of using more generic data-structures. ExprWalkerInfo: private Map<String, List<Node>> pushdownPreds; private Map<Node, ExprInfo> exprInfoMap; In both of them, Node means exprNodeDesc, why dont we use that instead ? Simlarly, in OpWalkerInfo: private Map<Node, ExprWalkerInfo> opToPrunedPredsMap; private Map<Operator<? extends Serializable>, OpParseContext> opToParseCtxMap; use Operator instead of Node in opToPrunedPredsMap 4. Can you move OpWalker and ExprWalker in different directories ? 5. Why are filters only pushed on top of TableScan - cant it be done anywhere. - If you want to do so in a follow-up, can you file a JIRA for that ? 6. No apache header in many files (ppd directory) SemanticAnalyzer.java: A comment explaining the reason for existence of colInfoMap will help. Give an example: group by where the table column order is different from the grouped column order. Same for posAliasMap, nameToInputColumnInfoMap for JOIN genJoinOperatorChildren: if(aliases == null) { aliases = new HashSet<String>(); posToAliasMap.put(pos, aliases); } isn't the IF redundant ? > Implement predicate push down for hive queries > ---------------------------------------------- > > Key: HIVE-279 > URL: https://issues.apache.org/jira/browse/HIVE-279 > Project: Hadoop Hive > Issue Type: New Feature > Affects Versions: 0.2.0 > Reporter: Prasad Chakka > Assignee: Prasad Chakka > Attachments: hive-279.2.patch, hive-279.patch > > > Push predicates that are expressed in outer queries into inner queries where > possible so that rows will get filtered out sooner. > eg. > select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and > a.gender = 'm' > current compiler generates the filter predicate in the reducer after the join > so all the rows have to be passed from mapper to reducer. by pushing the > filter predicate to the mapper, query performance should improve. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.