Viraj, yeah similar discussion.

Istvan, good point. Of course you cannot guard against what folks do at the 
HBase level, and we should still allow that. Just like in PHOENIX-6343.
But we could disallow creating new tables like this in Phoenix, also like 
PHOENIX-6343.


I just think that causes more problems than it solves. While HBase looks at 
things key-value by key-value, Phoenix takes a row-by-row view.
>From that angle, why would you ever want a table with ambiguous column names? 
>And neither qualified nor duplicate column names are in the SQL standard - 
>AFAIK at least.

In the Trino case I described, it's using the standard JDBC metadata, then 
using the COLUMN_NAM column to read the name of the column. That is the JDBC 
standard.
In the Phoenix case you now also have to read the COLUMN_FAMILY column, then 
know how to build a qualified column name, and that be able to pass this 
through all the query planning, etc. And there is no possible way to hand a 
qualified column name to Phoenix. Phoenix does *not* allow "cf.cq", it has to 
be cf.cq or "cf"."cq".
Apparently there's a way in Trino to model this as a structured column. But 
that too does not hit the mark, then you *have* to use structs to access any 
column in Phoenix.

So in Trino (and probably other JDBC clients) we could opt to simply fail on 
tables like these - that's what's happening now... Or come up with unnatural 
constructs to fit this into the SQL model. Trino is just an example here.

At least we can document that qualifier names *should* be unique, and it 
otherwise might cause problems with downstream BigData integrations.

-- Lars

On Saturday, March 27, 2021, 2:51:04 AM PDT, Viraj Jasani <[email protected]> 
wrote: 





Somewhat similar discussion we had on PHOENIX-6343 and why the duplicate
column check was restricted to default CF only.


On Sat, 27 Mar 2021 at 10:42 AM, Istvan Toth <[email protected]>
wrote:

> The first thing that comes to my mind is that this would limit
> functionality when defining views on existing raw HBase tables (though
> aliasing the columns may solve that)
> Another thing to consider is how this would affect dynamic column use cases
> for either native Phoenix tables of views on HBase tables.
>
> Istvan
>
> On Sat, Mar 27, 2021 at 12:20 AM [email protected] <[email protected]>
> wrote:
>
> > As you may or may not know, Phoenix allows for duplicate column names as
> > long as they are placed in different column families.
> >
> > You can create a table such as CREATE TABLE t (pk1 ..., x.v1, y.v1, ...).
> > Now each time you want to refer to v1 you need to qualify it with its
> > column family or you get a AmbiguousColumnException.
> >
> > Worse, you also do CREATE TABLE t (pk1 ..., v1, x.v1, ...). Now a using
> v1
> > in a query will silently resolve to v1 in the default column family and
> > x.v1 does have to be qualified.
> >
> > As I reason through how this should work in Trino's (formerly Presto)
> > Phoenix connector, it occurs to me that should probably just disallow
> this.
> >
> > CREATE TABLE t (pk1 ..., x.v1, y.v2, ...) works just fine and you can
> > refer to both v1 and v2 without the need to qualify them with the column
> > family.
> >
> > So... In essence: Should we require all columns to be uniquely named,
> even
> > with multiple column families?
> >
> > Thanks.
> >
> > -- Lars
> >
>
>
> --
> *István Tóth* | Staff Software Engineer
> [email protected] <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> <https://www.cloudera.com/>
> ------------------------------
>

Reply via email to