[
https://issues.apache.org/jira/browse/PHOENIX-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924503#comment-16924503
]
Geoffrey Jacoby commented on PHOENIX-5443:
------------------------------------------
[~apurtell] - I spent some time looking through Confluent's Schema Registry
yesterday. Some findings:
Pros:
1. Some very nice-looking SerDes for reading / writing Avro + schema markers to
Kafka topics, which are Apache-licensed
2. The SchemaRegistryClient interface itself doesn't make Confluent-specific
assumptions about how the server-side will be implemented -- it can be a local
class, a web service, or Confluent's own implementation
3. They've broken the modules out sensibly (and stored in Maven Central
separately) so that downstream users can grab the Apache-licensed components
without grabbing the Confluent-licensed server-side module.
Cons:
1. Some of those nice SerDe classes explicitly assume that their inner
SchemaRegistryClient will be either a CachedSchemaRegistryClient (which assumes
that cache misses go to a web service -- doesn't have to be Confluent's, but
some REST service at some URL) or a MockedSchemaRegistryClient for testing. So
far I don't see a good way to inject a different client that's not behind a
REST API
2. There are two ways to lookup Schemas: by a globally unique id, or by subject
+ version. (Subject loosely translates to (topic name) + either "-key" or
"-value"). Producers generally lookup by subject + version, consumers by unique
id found in the message. There's just one problem: that globally (within your
infra) unique id _is an int_. Confluent's implementation, according to the
docs, using monotonically increasing ints given by their server (which has
several Issues in GitHub listed where their HA strategy interferes with this).
We'd probably want to use a hash on ([tenant], table, timestamp), and 32-bits
is way too narrow to be unique for a hash.
3. General issues with adding some other projects' complex dependencies --
using published maven artifacts would help, but just getting their code to
build on my laptop was pretty nontrivial.
At the moment I'm leaning against using it since the cons are significant, but
curious to hear [~apurtell] and others' thoughts.
> API to Generate Avro Schema of Phoenix Object
> ---------------------------------------------
>
> Key: PHOENIX-5443
> URL: https://issues.apache.org/jira/browse/PHOENIX-5443
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
>
> Based on an object name (such as a table or view) and an optional tenant_id
> and timestamp, we should be able to construct all the explicitly defined
> columns and data types of an object. (Obviously, we can't do this for dynamic
> columns.)
> From these fields, we should be able to construct a schema for the object and
> return it to the user. While this JIRA will focus on Avro, the output format
> should be pluggable so that other implementations could output to Thrift or
> Protobuf, and PHOENIX-4286 could use it to output as SQL CREATE statements.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)