Hello Gang, I've been tracking a lot of issues recently regarding qualified tables names, qualified table names, table names using back ticks, and other similar circumstances.
I've looked into trying to address some of these and noted that these issue goes way back and are go all the way down to the core of Hive. To start with, I wanted to use the ANTLR grammar to address some of these issues and to standardize behavior across all queries. For example, there is currently a patch that disallows table names from having a 'dot' in the name. I'm not 100% sure it applies to all queries, so I wanted to codify this restriction in the parser grammar. So it got me looking at the grammar. In parallel, I also tried to build a supplemental parser in Java for parsing table names (HIVE-23150) and I was hitting some weird, and confusing, edge cases bubbling up from the parser. I eventually traced it back to the fact that there are a lot of weird rules around table names in the grammar including something called "REGEX Column Specification." This feature is problematic as it blindly labels most table names as being a regex. It really should only apply to column names, but the grammar defines a table name as also possibly being a regex. There is a lot of ambiguity because a table named "a" could be a literal value or a legal regex. When a table name is defined as a regex, a different code path is taken from when a table name is considered to be a literal value. Where I first saw this issue was in a qtest where a table name `s/c` was producing a different result than a table named `s+c`. This regex feature is not something I've seen in MySQL or Postgres. In MySQL, any table name surrounded with a back tick can be just about any UTF-8 character, so it's not really feasible to tell, without some kind of SQL hint, that this table name is a regex or a literal value. This feature adds a lot of ambiguity and complexity, it is not supported by other major RDBMS, and it adds only very minor benefit. I also hope to move Hive in a direction of fully supporting UTF-8. I have put a patch up to remove it: https://issues.apache.org/jira/browse/HIVE-23183 References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification https://dev.mysql.com/doc/refman/8.0/en/identifiers.html Thanks, David