[jira] [Commented] (PHOENIX-5443) API to Generate Avro Schema of Phoenix Object

Geoffrey Jacoby (Jira) Fri, 06 Sep 2019 11:32:13 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924503#comment-16924503
 ]


Geoffrey Jacoby commented on PHOENIX-5443:
------------------------------------------

[~apurtell] - I spent some time looking through Confluent's Schema Registry 
yesterday. Some findings:

Pros:
1. Some very nice-looking SerDes for reading / writing Avro + schema markers to 
Kafka topics, which are Apache-licensed
2. The SchemaRegistryClient interface itself doesn't make Confluent-specific 
assumptions about how the server-side will be implemented -- it can be a local 
class, a web service, or Confluent's own implementation
3. They've broken the modules out sensibly (and stored in Maven Central 
separately) so that downstream users can grab the Apache-licensed components 
without grabbing the Confluent-licensed server-side module. 

Cons:
1. Some of those nice SerDe classes explicitly assume that their inner 
SchemaRegistryClient will be either a CachedSchemaRegistryClient (which assumes 
that cache misses go to a web service -- doesn't have to be Confluent's, but 
some REST service at some URL) or a MockedSchemaRegistryClient for testing. So 
far I don't see a good way to inject a different client that's not behind a 
REST API

2. There are two ways to lookup Schemas: by a globally unique id, or by subject 
+ version. (Subject loosely translates to (topic name) + either "-key" or 
"-value"). Producers generally lookup by subject + version, consumers by unique 
id found in the message. There's just one problem: that globally (within your 
infra) unique id _is an int_. Confluent's implementation, according to the 
docs, using monotonically increasing ints given by their server (which has 
several Issues in GitHub listed where their HA strategy interferes with this). 
We'd probably want to use a hash on ([tenant], table, timestamp), and 32-bits 
is way too narrow to be unique for a hash. 

3. General issues with adding some other projects' complex dependencies -- 
using published maven artifacts would help, but just getting their code to 
build on my laptop was pretty nontrivial.

At the moment I'm leaning against using it since the cons are significant, but 
curious to hear [~apurtell] and others' thoughts.

> API to Generate Avro Schema of Phoenix Object
> ---------------------------------------------
>
>                 Key: PHOENIX-5443
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5443
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> Based on an object name (such as a table or view) and an optional tenant_id 
> and timestamp, we should be able to construct all the explicitly defined 
> columns and data types of an object. (Obviously, we can't do this for dynamic 
> columns.)
> From these fields, we should be able to construct a schema for the object and 
> return it to the user. While this JIRA will focus on Avro, the output format 
> should be pluggable so that other implementations could output to Thrift or 
> Protobuf, and PHOENIX-4286 could use it to output as SQL CREATE statements. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (PHOENIX-5443) API to Generate Avro Schema of Phoenix Object

Reply via email to