[
https://issues.apache.org/jira/browse/PHOENIX-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310847#comment-17310847
]
Lars Hofhansl edited comment on PHOENIX-6433 at 3/29/21, 6:04 PM:
------------------------------------------------------------------
Options:
# Leave it the way it is - this is causing downstream problem, so not a good
option IMHO.
# Document that duplication column names may cause problems in the downstream
BigData systems such as Trino and Spark - this does not require code changes,
but raises awareness to avoid this unless it is known to be necessary.
# Add a config to disallow creating tables with duplicate column names - most
flexibility, perhaps unnecessary new complexity.
# Outright disallow creating new tables with duplicate column names going
forward - easiest to understand, but will make testing with duplicate columns
hard, we'd still need a way to create this scenario for tests.
There are probably more options.
Personally I'd be happy with any option 2-4. #2 only if the mention is
prominent somewhere.
was (Author: lhofhansl):
Options:
* Leave it the way it is - this is causing downstream problem, so not a good
option IMHO.
* Document that duplication column names may cause problems in the downstream
BigData systems such as Trino and Spark - this does not require code changes,
but raises awareness to avoid this unless it is known to be necessary.
* Add a config to disallow creating tables with duplicate column names - most
flexibility, perhaps unnecessary new complexity.
* Outright disallow creating new tables with duplicate column names going
forward - easiest to understand, but will make testing with duplicate columns
hard, we'd still need a way to create this scenario for tests.
There are probably more options.
> DISCUSS: Disllow creating new tables with duplicate column qualifiers by
> default.
> ---------------------------------------------------------------------------------
>
> Key: PHOENIX-6433
> URL: https://issues.apache.org/jira/browse/PHOENIX-6433
> Project: Phoenix
> Issue Type: Wish
> Reporter: Lars Hofhansl
> Priority: Major
>
> Phoenix allows specifying columns to "reside" in specific column families. As
> long as the columns are unique you can simply refer to them via the column
> name. In that case the column families are just about the physical placement
> of the columns. No special SQL constructs are needed... This is similar to
> indexes, they are for optimization, but queries are unchanged.
> However...
> Currently Phoenix also allows creating tables with duplicate column
> qualifiers such as:
> {{CREATE TABLE t (pk1 ..., x.v1, y.v1, ...)}} or
> {{CREATE TABLE t (pk1 ..., v1, x.v1, ...)}}
> In the first case you must specific any reference to {{v1}} with the column
> family or an {{AmbiguousColumnException}} is thrown. In the second case
> {{v1}} refers to the {{0.v1}}.
> In both cases the physical optimization of the column storage now leaks into
> the SQL queries - unnecessarily, IMHO at least.
> For tables not created in Phoenix, or with dynamic columns, this is not
> avoidable.
> I do think we should disallow creating new tables with duplicated (static)
> column names, to reduce confusion and surprises.
> Related: PHOENIX-6343
--
This message was sent by Atlassian Jira
(v8.3.4#803005)