Viraj, yeah similar discussion. Istvan, good point. Of course you cannot guard against what folks do at the HBase level, and we should still allow that. Just like in PHOENIX-6343. But we could disallow creating new tables like this in Phoenix, also like PHOENIX-6343.
I just think that causes more problems than it solves. While HBase looks at things key-value by key-value, Phoenix takes a row-by-row view. >From that angle, why would you ever want a table with ambiguous column names? >And neither qualified nor duplicate column names are in the SQL standard - >AFAIK at least. In the Trino case I described, it's using the standard JDBC metadata, then using the COLUMN_NAM column to read the name of the column. That is the JDBC standard. In the Phoenix case you now also have to read the COLUMN_FAMILY column, then know how to build a qualified column name, and that be able to pass this through all the query planning, etc. And there is no possible way to hand a qualified column name to Phoenix. Phoenix does *not* allow "cf.cq", it has to be cf.cq or "cf"."cq". Apparently there's a way in Trino to model this as a structured column. But that too does not hit the mark, then you *have* to use structs to access any column in Phoenix. So in Trino (and probably other JDBC clients) we could opt to simply fail on tables like these - that's what's happening now... Or come up with unnatural constructs to fit this into the SQL model. Trino is just an example here. At least we can document that qualifier names *should* be unique, and it otherwise might cause problems with downstream BigData integrations. -- Lars On Saturday, March 27, 2021, 2:51:04 AM PDT, Viraj Jasani <[email protected]> wrote: Somewhat similar discussion we had on PHOENIX-6343 and why the duplicate column check was restricted to default CF only. On Sat, 27 Mar 2021 at 10:42 AM, Istvan Toth <[email protected]> wrote: > The first thing that comes to my mind is that this would limit > functionality when defining views on existing raw HBase tables (though > aliasing the columns may solve that) > Another thing to consider is how this would affect dynamic column use cases > for either native Phoenix tables of views on HBase tables. > > Istvan > > On Sat, Mar 27, 2021 at 12:20 AM [email protected] <[email protected]> > wrote: > > > As you may or may not know, Phoenix allows for duplicate column names as > > long as they are placed in different column families. > > > > You can create a table such as CREATE TABLE t (pk1 ..., x.v1, y.v1, ...). > > Now each time you want to refer to v1 you need to qualify it with its > > column family or you get a AmbiguousColumnException. > > > > Worse, you also do CREATE TABLE t (pk1 ..., v1, x.v1, ...). Now a using > v1 > > in a query will silently resolve to v1 in the default column family and > > x.v1 does have to be qualified. > > > > As I reason through how this should work in Trino's (formerly Presto) > > Phoenix connector, it occurs to me that should probably just disallow > this. > > > > CREATE TABLE t (pk1 ..., x.v1, y.v2, ...) works just fine and you can > > refer to both v1 and v2 without the need to qualify them with the column > > family. > > > > So... In essence: Should we require all columns to be uniquely named, > even > > with multiple column families? > > > > Thanks. > > > > -- Lars > > > > > -- > *István Tóth* | Staff Software Engineer > [email protected] <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > <https://www.cloudera.com/> > ------------------------------ >
