[
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083236#comment-17083236
]
David Mollitor commented on HIVE-23176:
---------------------------------------
[~kgyrtkirk] Thanks for the feedback.
This feature is not standard.
I discussed the motivation here:
[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]
There are two primary concerns:
* If Hive is going to support UTF-8 in the same way other major vendors do,
then there are almost no restrictions to what characters can be in a object
identifier, so it is not possible to simply "detect" and is therefore ambiguous
if a user wanted to use a Regex or a complex table name.
* This feature accidentally added a bunch of weird edge cases where object
identifier parsing takes different code paths
This feature could be interesting, though since it's not a SQL standard, it's a
bit of a Hive-only shortcut which can cause interoperability problems, but it
is not currently implemented in a great way. It should not be reflected in the
actual grammar of the SQL parser. To do implement such a feature, it would
make sense that it be:
* Not part of the grammar
* Configurable (enabled/disabled)
* Applies only to back ticked object identifiers that are ASCII-only
> Remove SELECT REGEX Column Feature
> ----------------------------------
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch,
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>
> Hive has this interesting feature for doing REGEX to SELECT multiple columns.
> This needs to go. It is not SQL standard and as currently implemented, it
> is impossible to determine if a column identifier is a REGEX or the actual
> name of the column. If a column name is enclosed in back ticks then any
> UTF-8 character is a valid table name.
>
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)