Re: Schema ToString

kyle minmaxcorp.com Mon, 07 Mar 2022 10:46:56 -0800

What exactly is the value of having the same output in mixed environments ?
Or: what kind of problems do you face now ?


This is actually a result of dealing with Confluent. They (Confluent) happen to 
do an MD5 hash on the schema that you have compared to what is in the registry. 
Since Java and C# generate that schema differently it causes it to fail the MD5 
hash. While this isn't a problem with Avro specifically, I want to apply a 
capability to support generating the same schema.

So the plan currently is to do the following:
1. Maintain backwards compatibility. We don't want to change what it does 
today, because we could have unintended consequences of just fixing schema 
generation. (both with ToString and what AvroGen does).
2. Extend the functionality to define a format that determines how the schema 
can be generated.

For number 2. I'm debating the implementation and doing a bit of prototyping 
before writing the actual story for implementation. Once I'm done, I want to 
run the design by you and Ryan (and anyone else that wants to contribute).

> On March 3, 2022 1:09 PM Martin Grigorov <[email protected]> wrote:
> 
> 
> 
> 
> 
> On Tue, Mar 1, 2022 at 6:42 AM kyle minmaxcorp.com (http://minmaxcorp.com) 
> <[email protected]> wrote:
> > I am noticing a difference between Java and C# versions of Avro when I call 
> > Schema.ToString().
> > 
> > First, is that the C# version adds the namespace to each Named schema. 
> > Second, is the order of the output. I would expect across languages that we 
> > output the same JSON string.
> > 
> > The following took a schema json string and called Schema.Parse(string 
> > json) and Schema.parse(String jsonSchema, boolean validate) or C# and Java.
> > 
> > Original Schema string
> > {"type":"record","name":"TestRecord","namespace":"test.namespace","fields":[{"name":"testName","type":{"type":"record","name":"TestData","fields":[{"name":"version","type":"float","doc":"version
> >  number of this schema"}]}}]}
> > 
> > C# output of Schema.ToString()
> > {"type":"record","name":"TestRecord","namespace":"test.namespace","fields":[{"name":"testName","type":{"type":"record","name":"TestData","namespace":"test.namespace","fields":[{"name":"version","doc":"version
> >  number of this schema","type":"float"}]}}]}
> > 
> > Java output of Schema.toString
> > {"type":"record","name":"TestRecord","namespace":"test.namespace","fields":[{"name":"testName","type":{"type":"record","name":"TestData","fields":[{"name":"version","type":"float","doc":"version
> >  number of this schema"}]}}]}
> > 
> > It is not overly complicated to have the C# version match the Java version, 
> > but in order to maintain backwards compatibility while supporting a new 
> > output, we will need to create a Schema.ToJsonString method, and update the 
> > WriteJson* methods as well to support the new flow. Ideally we mark 
> > ToString() obsolete with the message to use the ToJsonString method. 
> > Eventually, pointing ToString() to the ToJsonString method.
> > 
> > While this work is not complicated it is a lot of work and testing. While, 
> > I personally see value in having the output being the same (I work in a 
> > mixed technology environment), I wanted to address any concerns with this 
> > sort of change.
> 
> What exactly is the value of having the same output in mixed environments ?
> Or: what kind of problems do you face now ?
> 
> 
> 
> > 
> > Thanks,
> > Kyle T. Schoonover

Re: Schema ToString

Reply via email to