[
https://issues.apache.org/jira/browse/HIVE-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Mollitor reassigned HIVE-23172:
-------------------------------------
> Quoted Backtick Columns Are Not Parsing Correctly
> -------------------------------------------------
>
> Key: HIVE-23172
> URL: https://issues.apache.org/jira/browse/HIVE-23172
> Project: Hive
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Critical
>
> I recently came across a weird behavior while examining failures of
> {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was
> surprised to see it fail because I couldn't see of any reason why it
> should... it's doing pretty standard SQL statements just like every other
> test, but for some reason this test is just a *little bit* differently than
> most others and it brought this issue to light.
> Turns out,... the parsing of table names is pretty much wrong across the
> board.
> The statement that caught my attention was this:
> {code:sql}
> DROP TABLE IF EXISTS `s/c`;
> {code}
> And here is the relevant grammar:
> {code:none}
> fragment
> RegexComponent
> : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
> | PLUS | STAR | QUESTION | MINUS | DOT
> | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
> | BITWISEXOR | BITWISEOR | DOLLAR | '!'
> ;
> Identifier
> :
> (Letter | Digit) (Letter | Digit | '_')*
> | {allowQuotedId()}? QuotedIdentifier /* though at the language level we
> allow all Identifiers to be QuotedIdentifiers;
> at the API level only columns
> are allowed to be of this form */
> | '`' RegexComponent+ '`'
> ;
> fragment
> QuotedIdentifier
> :
> '`' ( '``' | ~('`') )* '`' {
> setText(StringUtils.replace(getText().substring(1, getText().length() -1 ),
> "``", "`")); }
> ;
> {code}
> The mystery for me was that, for some reason, this String {{`s/c`}} was being
> stripped of its back-ticks. Every other test I investigated did not have this
> behavior, the back ticks were always preserved around the table name. The
> main Hive Java code base would see the back-ticks and deal with it
> internally. For HIVE-23150, I introduced some sanity checks and they were
> failing because they were expecting the back ticks to be present.
> With the help of HIVE-23171 I finally figured it out. So, what I discovered
> is that pretty much every table name is hitting the {{RegexComponent}} rule
> and the back ticks are carried forward. However, {{`s/c`}} the forward slash
> `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}}
> rule which is trimming the back ticks.
> I validated this by disabling {{QuotedIdentifier}}. When I did this,
> {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}}
> is being treated as a {{RegexComponent}}.
> So, if you have {{allowQuotedId}} disabled, table names can only use the
> characters defined in the {{RegexComponent}} rule (otherwise it errors), and
> it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled,
> then if the table name has a character not specified in {{RegexComponent}},
> it will identify it as a table name and it *will* strip the back ticks, if
> all the characters are part of {{RegexComponent}} then it will *not* strip
> the back ticks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)