[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903660#action_12903660 ]
Ashutosh Chauhan commented on HIVE-1546: ---------------------------------------- bq. What I was thinking in my previous comment was as follows: * only one level of factory * define new interface HiveSemanticAnalyzer (not an abstract class): copy signatures of interface methods from public methods on BaseSemanticAnalyzer; add Javadoc (I can help with that if any of them are non-obvious) * when callers get new analyzer instance from factory, they refer to it via the HiveSemanticAnalyzer interface only Are you envisioning something like this in Driver.java {code} HiveSemanticAnalyzer sem = HiveSemanticAnalyzerFactory.get(conf); // Do semantic analysis and plan generation sem.analyze(tree, ctx); // validate the plan sem.validate(); plan = new QueryPlan(command, sem); // get the output schema schema = getSchema(sem, conf); {code} Note where {sem} is used. If so, HiveSemanticAnalyzer interface needs to look like this: {code} public interface HiveSemanticAnalyzer extends Configurable{ public void analyze(ASTNode ast, Context ctx) throws SemanticException; public void validate() throws SemanticException; public List<FieldSchema> getResultSchema(); public FetchTask getFetchTask(); public List<Task<? extends Serializable>> getRootTasks(); public HashSet<ReadEntity> getInputs(); public HashSet<WriteEntity> getOutputs(); public LineageInfo getLineageInfo(); public HashMap<String, String> getIdToTableNameMap(); } {code} This to me look like an awkward interface, which has to expose internal details of the very class it is trying to hide. But even if we make an effort to improve on it and try to come up with a better interface, this will mean that I need to touch upon lot of critical code paths in BaseSemanticAnalyzer, SemanticAnalyzer and DDLSemanticAnalyzer which I am not sure I know enough of them to make changes. If this is not what you were envisioning then I have missed something in what you were suggesting. Assuming we dont go this route, I liked the approach in my second patch better then from my first patch. I think it provides better abstractions. But if you have different opinion, I am fine with the first one as well. Regarding handleGenericFileFormat, its the point (a). It cant be void, because in Howl where we override this it needs to return information to its caller which will work on that information. handleGenericFileFormat() can only make use of the token and provide back some information. Caller works on that information and modifies the task it has created which usually happens to be of type DDLWork. And even Hive will need to go through this same code flow if and when it decides to add more FileFormats. > Ability to plug custom Semantic Analyzers for Hive Grammar > ---------------------------------------------------------- > > Key: HIVE-1546 > URL: https://issues.apache.org/jira/browse/HIVE-1546 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 0.7.0 > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: hive-1546.patch, hive-1546_2.patch > > > It will be useful if Semantic Analysis phase is made pluggable such that > other projects can do custom analysis of hive queries before doing metastore > operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.