On Mon, Aug 27, 2018 at 2:03 PM, Thomas D'Silva <tdsi...@salesforce.com>
wrote:

> >
> >
> > 2. Can Phoenix be the de-facto schema for SQL on HBase?
> >
> > We've long asserted "if you have to ask how Phoenix serializes data, you
> > shouldn't be do it" (a nod that you have to write lots of code). What if
> we
> > turn that on its head? Could we extract our PDataType serialization,
> > composite row-key, column encoding, etc into a minimal API that folks
> with
> > their own itches can use?
> >
> > With the growing integrations into Phoenix, we could embrace them by
> > providing an API to make what they're doing easier. In the same vein, we
> > cement ourselves as a cornerstone of doing it "correctly".
> >
>
> +1 on standardizing the data type and storage format API so that it would
> be easier for other projects to use.
>

Adding my $0.02, since I've thought a good bit about this over the years.

The `DataType` [0] interface in HBase is built this precisely this idea in
mind -- sharing data encoding formats across HBase projects. Phoenix's
`PDataType` implements this interface. Exposing the encoders to 3rd
parties, then, is a matter of those 3rd parties using this interface and
consuming the phoenix-core jar. Maybe we want to break them out into their
own jar to minimize dependencies? That said, Phoenix's smarts about
compound rowkeys and packed column values are beyond simple column
encodings. These may not be as easily exposed to external tools...

I think, realistically, Phoenix would need to expose a number of
schema-related tools together in a package in order to provide "true
interoperability" with other tools. Pick a use case -- I'm fond of
"offline" use-cases, something like building a Phoenix-compatible table
from a MapReduce (or Spark, or Hive, or...) application on a cluster that
doesn't even have HBase available. Then plumb it out the other way, reading
an exported snapshot of a Phoenix table from the same "offline"
environment. It's a pretty extreme case that I think is worth while because
enables a lot of flexibility for users, and would shake out a bunch of
these related issues. I suspect this requires going below the JDBC
interface, but I could be wrong...

-n

[0]:
https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/types/DataType.html

Reply via email to