Re: Remove REGEX Column Specification

David Mollitor Wed, 15 Apr 2020 14:51:26 -0700

I've got all tests passing on this.

Are there other questions?


Is anyone willing to +1 ?

Thanks.

On Tue, Apr 14, 2020, 12:28 PM David Mollitor <dam6...@gmail.com> wrote:

> Hey Zoltan,
>
> Thanks for the feedback and for sharing HIVE-16496.
>
> I think HIVE-16496 is a better approach because it allows for the standard
> SQL behavior of object identifiers, but the SQL syntax is expanded (instead
> of overloaded) to provide this feature.
>
> Also, if a user would like to do some sort of regex, they can query the
> information_schema (if/when Hive gets that).
>
> Also, I just re-read my previous email and I do apologize, I provided the
> wrong jira.  The correct one for removal is:
>
> https://issues.apache.org/jira/browse/HIVE-23176
>
> Thanks.
>
>
>
> David
>
> On Tue, Apr 14, 2020 at 12:16 PM Zoltan Haindrich <k...@rxd.hu> wrote:
>
>> Hey,
>>
>> I don't want to protect this feature - but I think it could be usefull;
>> probably it would be ok to remove it but we should provide something else
>> instead - I think this is
>> the only way to "exclude" some specific columns from the output - without
>> listing all the columns.
>>
>> How much are users actually use this feature?
>>
>> We had a somewhat related discussion a few years ago:
>> https://issues.apache.org/jira/browse/HIVE-16496
>>
>> cheers,
>> Zoltan
>>
>> On 4/13/20 3:56 PM, David Mollitor wrote:
>> > Hello Gang,
>> >
>> > I've been tracking a lot of issues recently regarding qualified tables
>> > names, qualified table names, table names using back ticks, and other
>> > similar circumstances.
>> >
>> > I've looked into trying to address some of these and noted that these
>> issue
>> > goes way back and are go all the way down to the core of Hive.
>> >
>> > To start with, I wanted to use the ANTLR grammar to address some of
>> these
>> > issues and to standardize behavior across all queries.  For example,
>> there
>> > is currently a patch that disallows table names from having a 'dot' in
>> the
>> > name.  I'm not 100% sure it applies to all queries, so  I wanted to
>> codify
>> > this restriction in the parser grammar.  So it got me looking at the
>> > grammar.
>> >
>> > In parallel, I also tried to build a supplemental parser in Java for
>> > parsing table names (HIVE-23150) and I was hitting some weird, and
>> > confusing, edge cases bubbling up from the parser.  I eventually traced
>> it
>> > back to the fact that there are a lot of weird rules around table names
>> in
>> > the grammar including something called "REGEX Column Specification."
>> >
>> > This feature is problematic as it blindly labels most table names as
>> being
>> > a regex.  It really should only apply to column names, but the grammar
>> > defines a table name as also possibly being a regex. There is a lot of
>> > ambiguity because a table named "a" could be a literal value or a legal
>> > regex.  When a table name is defined as a regex, a different code path
>> is
>> > taken from when a table name is considered to be a literal value. Where
>> I
>> > first saw this issue was in a qtest where a table name `s/c` was
>> producing
>> > a different result than a table named `s+c`.
>> >
>> > This regex feature is not something I've seen in MySQL or Postgres.  In
>> > MySQL, any table name surrounded with a back tick can be just about any
>> > UTF-8 character, so it's not really feasible to tell, without some kind
>> of
>> > SQL hint, that this table name is a regex or a literal value.
>> >
>> > This feature adds a lot of ambiguity and complexity, it is not
>> supported by
>> > other major RDBMS, and it adds only very minor benefit.  I also hope to
>> > move Hive in a direction of fully supporting UTF-8.
>> >
>> > I have put a patch up to remove it:
>> > https://issues.apache.org/jira/browse/HIVE-23183
>> >
>> >
>> > References:
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
>> >
>> >
>> > https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
>> >
>> >
>> > Thanks,
>> > David
>> >
>>
>

Re: Remove REGEX Column Specification

Reply via email to