[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083236#comment-17083236
 ] 

David Mollitor edited comment on HIVE-23176 at 4/22/20, 1:54 PM:
-----------------------------------------------------------------

[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:

(EDIT: based on discussions)
 * Extends the standard SQL grammar instead of overloading the existing


was (Author: belugabehr):
[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:
 * Not part of the grammar
 * Configurable (enabled/disabled) for interpreting the literal object 
identifiers supplied in the SQL statement in the Java parser code
 * Applies only to back ticked object identifiers that are ASCII-only

> Remove SELECT REGEX Column Feature
> ----------------------------------
>
>                 Key: HIVE-23176
>                 URL: https://issues.apache.org/jira/browse/HIVE-23176
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: backwards-incompatible
>         Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to