[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903660#action_12903660
 ] 

Ashutosh Chauhan commented on HIVE-1546:
----------------------------------------

bq. What I was thinking in my previous comment was as follows:
    * only one level of factory
    * define new interface HiveSemanticAnalyzer (not an abstract class): copy 
signatures of interface methods from public methods on BaseSemanticAnalyzer; 
add Javadoc (I can help with that if any of them are non-obvious)
    * when callers get new analyzer instance from factory, they refer to it via 
the HiveSemanticAnalyzer interface only

Are you envisioning something like this in Driver.java
{code}
HiveSemanticAnalyzer sem = HiveSemanticAnalyzerFactory.get(conf);

// Do semantic analysis and plan generation
sem.analyze(tree, ctx);

// validate the plan
sem.validate();

plan = new QueryPlan(command, sem);

// get the output schema
schema = getSchema(sem, conf);
{code} 

Note where {sem} is used. If so, HiveSemanticAnalyzer interface needs to look 
like this:
{code}
public interface HiveSemanticAnalyzer extends Configurable{

  public void analyze(ASTNode ast, Context ctx) throws SemanticException;

  public void validate() throws SemanticException;

  public List<FieldSchema> getResultSchema();

  public FetchTask getFetchTask();

  public List<Task<? extends Serializable>> getRootTasks();

  public HashSet<ReadEntity> getInputs();

  public HashSet<WriteEntity> getOutputs();

  public LineageInfo getLineageInfo();

  public HashMap<String, String> getIdToTableNameMap();
}

{code}

This to me look like an awkward interface, which has to expose internal details 
of the very class it is trying to hide.
But even if we make an effort to improve on it and try to come up with a  
better interface, this will mean that I need to touch upon lot of critical code 
paths in BaseSemanticAnalyzer, SemanticAnalyzer and DDLSemanticAnalyzer which I 
am not sure I know enough of them to make changes. If this is not what you were 
envisioning then I have missed something in what you were suggesting.

Assuming we dont go this route, I liked the approach in my second patch better 
then from my first patch. I think it provides better abstractions. But if you 
have different opinion, I am fine with the first one as well.

Regarding handleGenericFileFormat, its the point (a). It cant be void, because 
in Howl where we override this it needs to return information to its caller 
which will work on that information. handleGenericFileFormat() can only make 
use of the token and provide back some information. Caller works on that 
information and modifies the task it has created which usually happens to be of 
type DDLWork. And even Hive will need to go through this same code flow if and 
when it decides to add more FileFormats.

> Ability to plug custom Semantic Analyzers for Hive Grammar
> ----------------------------------------------------------
>
>                 Key: HIVE-1546
>                 URL: https://issues.apache.org/jira/browse/HIVE-1546
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: hive-1546.patch, hive-1546_2.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to