David Mollitor created HIVE-23149:
-------------------------------------

             Summary: Consistency of Parsing Object Identifiers
                 Key: HIVE-23149
                 URL: https://issues.apache.org/jira/browse/HIVE-23149
             Project: Hive
          Issue Type: Improvement
            Reporter: David Mollitor
            Assignee: David Mollitor


There needs to be better consistency with handling of object identifiers 
(database, tables, column, view, function, etc.).  I think it makes sense to 
standardize on the same rules which MySQL/MariaDB uses for their column names 
so that Hive can be more of a drop-in replacement for these.
 
The two important things to keep in mind are:
 
1// Permitted characters in quoted identifiers include the full Unicode Basic 
Multilingual Plane (BMP), except U+0000
 
2// If any components of a multiple-part name require quoting, quote them 
individually rather than quoting the name as a whole. For example, write 
{{`my-table`.`my-column`}}, not {{`my-table.my-column`}}.  
 
[https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
[https://dev.mysql.com/doc/refman/8.0/en/identifier-qualifiers.html]  

 
That is to say:
 
{code:sql}
-- Select all rows from a table named `default.mytable`
-- (Yes, the table name itself has a period in it. This is valid)
SELECT * FROM `default.mytable`;
 
-- Select all rows from database `default`, table `mytable`
SELECT * FROM `default`.`mytable`;  
{code}
 
This plays out in a couple of ways.  There may be more, but these are the ones 
I know about already:
 
1// Hive generates incorrect syntax: [HIVE-23128]
 
2// Hive throws exception if there is a period in the table name.  This is an 
invalid response.  Table name may have a period in them. More likely than not, 
it will throw 'table not found' exception since the user most likely 
accidentally used backticks incorrectly and meant to specify a db and a table 
separately. [HIVE-16907]

Once we have the parsing figured out and support for backticks to enclose UTF-8 
strings, then the backend database needs to actually support the UTF-8 
character set.  It currently does not: [HIVE-1808]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to