[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

Dudu Markovitz (JIRA) Fri, 21 Apr 2017 08:26:31 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dudu Markovitz updated HIVE-16496:
----------------------------------
    Description: 
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})     from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})    from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})    from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-})     from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  

  was:
support the following syntax:

{code}
select * exclude (a,b,e) from t
{code}

which for a table t with columns a,b,c,d,e would be equal to:

{code}
select c,d from t
{code}

Please note that the EXCLUDE clause relates directly to its preceding asterisk.

A common use case would be:

{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}

This supplies a very clean way to select all columns without getting "Ambiguous 
column reference" and without the need to specify all the columns of at least 
one of the tables.
 
Currently, without this enhancement, the query would look something like this:

{code}
select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
on t1.x=t2.x;
{code}

Considering a table may hold hundreds or even thousands of column, this can be 
come very ugly and error prone.
Often this require some scripting work.

An extension to this enhancement would be supporting positional column 
notation, e.g.:

{code}
select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
select * exclude (${1-3})     from t   /* exclude columns 1 to 3 */
select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 (same 
as previous example)*/
select * exclude (${3,5-})    from t   /* exclude the 3rd column and all 
columns from the 5th column to the end */
select * exclude (-${1,2})    from t   /* exclude last 2 columns */
select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
column from the end */
select * exclude (-${4-})     from t   /* exclude all column excpt for the last 
3 */
{code}


A complex example would look like:

{code}
select * exclude (${1-3,5,7},x,y,-${1-2})      from t  
{code}

exclude:
- first 3 columns
- 5th and 7th columns
- x and y 
- last 2 columns

P.s. 1
There should be *no* error raised for the following scenarios:
 - Excluding a column that does not exists in the columns set
 - Excluding the same column more than once (e.g. by name and by position).
- Excluding all columns

P.s. 2
This enhancement answers a real need that is being raised again again in the 
Hive users community as well in legacy RDBMS communities. 
As far as I know, no provider had yet implemented something similar and we have 
an opportunity here to lead the SQL ecosystem.

  


> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> ------------------------------------------------------------------
>
>                 Key: HIVE-16496
>                 URL: https://issues.apache.org/jira/browse/HIVE-16496
>             Project: Hive
>          Issue Type: Wish
>          Components: Parser
>            Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can 
> be come very ugly and error prone.
> Often this require some scripting work.
> An extension to this enhancement would be supporting positional column 
> notation, e.g.:
> {code}
> select * exclude (${1,2,5})   from t   /* exclude columns 1, 2 and 5 */
> select * exclude (${1-3})     from t   /* exclude columns 1 to 3 */
> select * exclude (${1,3-5,7}) from t   /* exclude columns 1,3 to 5 and 7 */
> select * exclude (${7,3-5,1}) from t   /* exclude columns 1,3 to 5 and 7 
> (same as previous example)*/
> select * exclude (${3,5-})    from t   /* exclude the 3rd column and all 
> columns from the 5th column to the end */
> select * exclude (-${1,2})    from t   /* exclude last 2 columns */
> select * exclude (-${1-3,7})  from t   /* exclude last 3 columns and the 7th 
> column from the end */
> select * exclude (-${4-})     from t   /* exclude all column excpt for the 
> last 3 */
> {code}
> A complex example would look like:
> {code}
> select * exclude (${1-3,5,7},x,y,-${1-2}) from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

Reply via email to