On Mon, Aug 27, 2018 at 2:03 PM, Thomas D'Silva <tdsi...@salesforce.com> wrote:
> > > > > > 2. Can Phoenix be the de-facto schema for SQL on HBase? > > > > We've long asserted "if you have to ask how Phoenix serializes data, you > > shouldn't be do it" (a nod that you have to write lots of code). What if > we > > turn that on its head? Could we extract our PDataType serialization, > > composite row-key, column encoding, etc into a minimal API that folks > with > > their own itches can use? > > > > With the growing integrations into Phoenix, we could embrace them by > > providing an API to make what they're doing easier. In the same vein, we > > cement ourselves as a cornerstone of doing it "correctly". > > > > +1 on standardizing the data type and storage format API so that it would > be easier for other projects to use. > Adding my $0.02, since I've thought a good bit about this over the years. The `DataType` [0] interface in HBase is built this precisely this idea in mind -- sharing data encoding formats across HBase projects. Phoenix's `PDataType` implements this interface. Exposing the encoders to 3rd parties, then, is a matter of those 3rd parties using this interface and consuming the phoenix-core jar. Maybe we want to break them out into their own jar to minimize dependencies? That said, Phoenix's smarts about compound rowkeys and packed column values are beyond simple column encodings. These may not be as easily exposed to external tools... I think, realistically, Phoenix would need to expose a number of schema-related tools together in a package in order to provide "true interoperability" with other tools. Pick a use case -- I'm fond of "offline" use-cases, something like building a Phoenix-compatible table from a MapReduce (or Spark, or Hive, or...) application on a cluster that doesn't even have HBase available. Then plumb it out the other way, reading an exported snapshot of a Phoenix table from the same "offline" environment. It's a pretty extreme case that I think is worth while because enables a lot of flexibility for users, and would shake out a bunch of these related issues. I suspect this requires going below the JDBC interface, but I could be wrong... -n [0]: https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/types/DataType.html