[ 
https://issues.apache.org/jira/browse/PHOENIX-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310847#comment-17310847
 ] 

Lars Hofhansl edited comment on PHOENIX-6433 at 3/29/21, 6:04 PM:
------------------------------------------------------------------

Options:
 # Leave it the way it is - this is causing downstream problem, so not a good 
option IMHO.
 # Document that duplication column names may cause problems in the downstream 
BigData systems such as Trino and Spark - this does not require code changes, 
but raises awareness to avoid this unless it is known to be necessary.
 # Add a config to disallow creating tables with duplicate column names - most 
flexibility, perhaps unnecessary new complexity.
 # Outright disallow creating new tables with duplicate column names going 
forward - easiest to understand, but will make testing with duplicate columns 
hard, we'd still need a way to create this scenario for tests.

There are probably more options.

 

Personally I'd be happy with any option 2-4. #2 only if the mention is 
prominent somewhere.


was (Author: lhofhansl):
Options:
 * Leave it the way it is - this is causing downstream problem, so not a good 
option IMHO.
 * Document that duplication column names may cause problems in the downstream 
BigData systems such as Trino and Spark - this does not require code changes, 
but raises awareness to avoid this unless it is known to be necessary.
 * Add a config to disallow creating tables with duplicate column names - most 
flexibility, perhaps unnecessary new complexity.
 * Outright disallow creating new tables with duplicate column names going 
forward - easiest to understand, but will make testing with duplicate columns 
hard, we'd still need a way to create this scenario for tests.

There are probably more options.

 

> DISCUSS: Disllow creating new tables with duplicate column qualifiers by 
> default.
> ---------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6433
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6433
>             Project: Phoenix
>          Issue Type: Wish
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> Phoenix allows specifying columns to "reside" in specific column families. As 
> long as the columns are unique you can simply refer to them via the column 
> name. In that case the column families are just about the physical placement 
> of the columns. No special SQL constructs are needed... This is similar to 
> indexes, they are for optimization, but queries are unchanged.
> However...
> Currently Phoenix also allows creating tables with duplicate column 
> qualifiers such as:
> {{CREATE TABLE t (pk1 ..., x.v1, y.v1, ...)}} or
> {{CREATE TABLE t (pk1 ..., v1, x.v1, ...)}}
> In the first case you must specific any reference to {{v1}} with the column 
> family or an {{AmbiguousColumnException}} is thrown. In the second case 
> {{v1}} refers to the {{0.v1}}.
> In both cases the physical optimization of the column storage now leaks into 
> the SQL queries - unnecessarily, IMHO at least.
> For tables not created in Phoenix, or with dynamic columns, this is not 
> avoidable. 
> I do think we should disallow creating new tables with duplicated (static) 
> column names, to reduce confusion and surprises.
> Related: PHOENIX-6343



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to