David Mollitor created HIVE-23172: ------------------------------------- Summary: Quoted Backtick Columns Are Not Parsing Correctly Key: HIVE-23172 URL: https://issues.apache.org/jira/browse/HIVE-23172 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor
I recently came across a weird behavior while examining failures of {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was surprised to see it fail because I couldn't see of any reason why it should... it's doing pretty standard SQL statements just like every other test, but for some reason this test is just a *little bit* differently than most others and it brought this issue to light. Turns out,... the parsing of table names is pretty much wrong across the board. The statement that caught my attention was this: {code:sql} DROP TABLE IF EXISTS `s/c`; {code} And here is the relevant grammar: {code:none} fragment RegexComponent : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' | PLUS | STAR | QUESTION | MINUS | DOT | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY | BITWISEXOR | BITWISEOR | DOLLAR | '!' ; Identifier : (Letter | Digit) (Letter | Digit | '_')* | {allowQuotedId()}? QuotedIdentifier /* though at the language level we allow all Identifiers to be QuotedIdentifiers; at the API level only columns are allowed to be of this form */ | '`' RegexComponent+ '`' ; fragment QuotedIdentifier : '`' ( '``' | ~('`') )* '`' { setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), "``", "`")); } ; {code} The mystery for me was that, for some reason, this String {{`s/c`}} was being stripped of its back-ticks. Every other test I investigated did not have this behavior, the back ticks were always preserved around the table name. The main Hive Java code base would see the back-ticks and deal with it internally. For HIVE-23150, I introduced some sanity checks and they were failing because they were expecting the back ticks to be present. With the help of HIVE-23171 I finally figured it out. So, what I discovered is that pretty much every table name is hitting the {{RegexComponent}} rule and the back ticks are carried forward. However, {{`s/c`}} the forward slash `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} rule which is trimming the back ticks. I validated this by disabling {{QuotedIdentifier}}. When I did this, {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}} is being treated as a {{RegexComponent}}. So, if you have {{allowQuotedId}} disabled, table names can only use the characters defined in the {{RegexComponent}} rule (otherwise it errors), and it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, then if the table name has a character not specified in {{RegexComponent}}, it will identify it as a table name and it *will* strip the back ticks, if all the characters are part of {{RegexComponent}} then it will *not* strip the back ticks. -- This message was sent by Atlassian Jira (v8.3.4#803005)