[ https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461651#comment-13461651 ]
Jason Dai commented on HIVE-3472: --------------------------------- bq. It's often better to start with a design doc or discussion on the dev lists before significant amounts of code are contributed. Yes, we agree with that, and this is what we plan to do with this JIRA - use it as a starting point to discuss the best approach to get SQL support for the Hadoop ecosystem (and Hive in particular). The code in panthera on github is not supposed to be a complete implementation to be reviewed; instead, it is supposed to be an early prototype used as a proof point. After the preferred approach and design are greed upon, we need to create several sub-tasks, each of which will be a small, manageable unit for review (just as what we did with HBase-6805). bq. The first question that comes to mind is why do you propose a separate parser for this? To provide full SQL support in the parser, there are basically three possible approaches: # Extend the existing Hive parser to support full SQL constructs # Reuse an existing SQL compliant parser and make it co-exist with the existing Hive parser # Reuse an existing SQL compliant parser and extend it to support Hive extensions The problem with the 1st approach is that, SQL is a very complex language, much more complex than HiveSQL (as a data point, the grammar file of the Hive parser is about 61KB with 2487 lines, while the grammar files of the open source SQL parser [https://github.com/porcelli/plsql-parser] are about 524KB with 8583 lines); in addition, some of the existing SQL grammars in the Hive parser need to be significantly changed to support more complex SQL constructs. Therefore, it would take significant efforts to add full SQL features to the Hive parser. The 2nd and 3rd approaches both seem possible, and require significantly fewer efforts than the first approach. bq. Forcing users to think about whether they are in HQL or SQL-92 will cause confusion and maintainability problems for them as well (e.g. a .sql file written by user1 for HQL will be run in SQL-92 mode by user2, producing either errors or wrong results. I think there are several options to address this issue. In the current example, the user actually needs to specify the mode (hiveql or sql) under which the following queries will run in the .sql file, so that the mode each query will run under is actually predetermined by the .sql file. Another option is that, instead of allowing two parsers to co-exist with each other, we can build two several jars - effectively two warehouse products (one for HiveSQL only and one for SQL only). Of course another option is to follow the 3rd approach mentioned above: extend the SQL parser to support HiveQL extensions. > Build An Analytical SQL Engine for MapReduce > -------------------------------------------- > > Key: HIVE-3472 > URL: https://issues.apache.org/jira/browse/HIVE-3472 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.10.0 > Reporter: Shengsheng Huang > Attachments: SQL-design.pdf > > > While there are continuous efforts in extending Hive’s SQL support (e.g., see > some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL > constructs are still not supported in HiveQL, such as selecting from multiple > tables, subquery in WHERE clauses, etc. > We propose to build a SQL-92 full compatible engine (for MapReduce based > analytical query processing) as an extension to Hive. > The SQL frontend will co-exist with the HiveQL frontend; consequently, one > can mix SQL and HiveQL statements in their queries (switching between HiveQL > mode and SQL-92 mode using a “hive.ql.mode” parameter before each query > statement). This way useful Hive extensions are still accessible to users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira