Re: On duplicate column names

[email protected] Sat, 27 Mar 2021 09:59:23 -0700

Viraj, yeah similar discussion.

Istvan, good point. Of course you cannot guard against what folks do at the 
HBase level, and we should still allow that. Just like in PHOENIX-6343.
But we could disallow creating new tables like this in Phoenix, also like 
PHOENIX-6343.

I just think that causes more problems than it solves. While HBase looks at 
things key-value by key-value, Phoenix takes a row-by-row view.
>From that angle, why would you ever want a table with ambiguous column names? 
>And neither qualified nor duplicate column names are in the SQL standard - 
>AFAIK at least.

In the Trino case I described, it's using the standard JDBC metadata, then 
using the COLUMN_NAM column to read the name of the column. That is the JDBC 
standard.
In the Phoenix case you now also have to read the COLUMN_FAMILY column, then 
know how to build a qualified column name, and that be able to pass this 
through all the query planning, etc. And there is no possible way to hand a 
qualified column name to Phoenix. Phoenix does *not* allow "cf.cq", it has to 
be cf.cq or "cf"."cq".
Apparently there's a way in Trino to model this as a structured column. But 
that too does not hit the mark, then you *have* to use structs to access any 
column in Phoenix.

So in Trino (and probably other JDBC clients) we could opt to simply fail on 
tables like these - that's what's happening now... Or come up with unnatural 
constructs to fit this into the SQL model. Trino is just an example here.

At least we can document that qualifier names *should* be unique, and it 
otherwise might cause problems with downstream BigData integrations.

-- Lars

On Saturday, March 27, 2021, 2:51:04 AM PDT, Viraj Jasani <[email protected]> 
wrote: 

Somewhat similar discussion we had on PHOENIX-6343 and why the duplicate
column check was restricted to default CF only.

On Sat, 27 Mar 2021 at 10:42 AM, Istvan Toth <[email protected]>
wrote:

> The first thing that comes to my mind is that this would limit
> functionality when defining views on existing raw HBase tables (though
> aliasing the columns may solve that)
> Another thing to consider is how this would affect dynamic column use cases
> for either native Phoenix tables of views on HBase tables.
>
> Istvan
>
> On Sat, Mar 27, 2021 at 12:20 AM [email protected] <[email protected]>
> wrote:
>
> > As you may or may not know, Phoenix allows for duplicate column names as
> > long as they are placed in different column families.
> >
> > You can create a table such as CREATE TABLE t (pk1 ..., x.v1, y.v1, ...).
> > Now each time you want to refer to v1 you need to qualify it with its
> > column family or you get a AmbiguousColumnException.
> >
> > Worse, you also do CREATE TABLE t (pk1 ..., v1, x.v1, ...). Now a using
> v1
> > in a query will silently resolve to v1 in the default column family and
> > x.v1 does have to be qualified.
> >
> > As I reason through how this should work in Trino's (formerly Presto)
> > Phoenix connector, it occurs to me that should probably just disallow
> this.
> >
> > CREATE TABLE t (pk1 ..., x.v1, y.v2, ...) works just fine and you can
> > refer to both v1 and v2 without the need to qualify them with the column
> > family.
> >
> > So... In essence: Should we require all columns to be uniquely named,
> even
> > with multiple column families?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>
> --
> *István Tóth* | Staff Software Engineer
> [email protected] <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> <https://www.cloudera.com/>
> ------------------------------
>

Re: On duplicate column names

Reply via email to