[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979553#comment-15979553
 ] 

Carter Shanklin edited comment on HIVE-16496 at 4/21/17 11:17 PM:
------------------------------------------------------------------

I'll mention that the first use case, that of excluding columns with duplicate 
names, is handled by the named columns join, HIVE-15983

So:
{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}
Is the same as:
{code}
select * from t1 join t2 using (x);
{code}

If that's a very common problem this may offer some relief. I think there's 
also a ticket for natural join support which automatically resolves the column 
names and uses them for the join list.

IMO the proposal is interesting. Question I have, if the table is wide, say 
2000 columns, how likely is a user to know the ordinal positions of the columns 
to exclude? It seems like the name would be the most likely info to have at 
hand.


was (Author: cartershanklin):
I'll mention that the first use case, that of excluding columns with duplicate 
names, is handled by the named columns join, HIVE-15983

So: select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
Is the same as: select * from t1 join t2 using (x);

If that's a very common problem this may offer some relief. I think there's 
also a ticket for natural join support which automatically resolves the column 
names and uses them for the join list.

IMO the proposal is interesting. Question I have, if the table is wide, say 
2000 columns, how likely is a user to know the ordinal positions of the columns 
to exclude? It seems like the name would be the most likely info to have at 
hand.

> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> ------------------------------------------------------------------
>
>                 Key: HIVE-16496
>                 URL: https://issues.apache.org/jira/browse/HIVE-16496
>             Project: Hive
>          Issue Type: Wish
>          Components: Parser
>            Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can 
> be come very ugly and error prone.
> Often this require some scripting work.
> h4. Extended syntax: positional notation support
> The positional notation is similar to the one used but the *cut* unix command 
> {code}
> select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
> select * exclude ($(1-3))     from t   -- exclude columns 1 to 3
> select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
> select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 
> (same as previous example)
> select * exclude ($(3,5-))    from t   -- exclude the 3rd column and all 
> columns from the 5th column to the end
> select * exclude (-$(1,2))    from t   -- exclude last 2 columns
> select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
> column from the end
> select * exclude (-$(4-))     from t   -- exclude all columns except for the 
> last 3
> {code}
> A complex example would look like:
> {code}
> select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to