I was not clear enough in my previous email.
What I meant is to 'wrap' the application schema in a serialization wrapper
schema that has a field indicating the "schema classname".
That (generic setup) combined with some generated code in the schema
classes should yield a solution that supports schema migration.

Niels

On Tue, Dec 22, 2015 at 11:55 AM, Niels Basjes <ni...@basjes.nl> wrote:

> Thanks for pointing this out.
> This is exactly what I was working on.
>
> The way I solved the 'does the schema match' question at work is by
> requiring that all schema's start with a single text field "schema
> classname" being the full class name of the class that was used to generate
> it.
> That way we can have newer versions of the schema and still be able to
> unpack them. In this form the classname is essentially an indicator if
> schema migration is possible; even though the schemas are different.
>
> What do you think of this direction?
>
> Niels
>
>
> On Mon, Dec 21, 2015 at 11:30 PM, Ryan Blue <b...@cloudera.com> wrote:
>
>> Niels,
>>
>> This sounds like a good idea to me to have methods like this. I've had to
>> write those methods several times!
>>
>> The idea is also related to AVRO-1704 [1], which is a suggestion to
>> standardize the encoding that is used for single records. Some projects
>> have been embedding the schema fingerprint at the start of each record, for
>> example, which would be a helpful thing to do.
>>
>> It may also be a good idea to create a helper object rather than
>> attaching new methods to the datum classes themselves. In your example
>> below, you have to create a new encoder or decoder for each method call. We
>> could instead keep a backing buffer and encoder/decoder on a class that the
>> caller instantiates so that they can be reused. At the same time, that
>> would make it possible to reuse the class with any data model and manage
>> the available schemas (if embedding the fingerprint).
>>
>> I'm thinking something like this:
>>
>>   ReflectClass datum = new ReflectClass();
>>   ReflectData model = ReflectData.get();
>>   DatumCodec codec = new DatumCodec(model, schema);
>>
>>   # convert datum to bytes using data model
>>   byte[] asBytes = codec.toBytes(datum);
>>
>>   # convert bytes to datum using data model
>>   ReflectClass copy = codec.fromBytes(asBytes);
>>
>> What do you think?
>>
>> rb
>>
>>
>> [1]: https://issues.apache.org/jira/browse/AVRO-1704
>>
>>
>> On 12/18/2015 05:01 AM, Niels Basjes wrote:
>>
>>> Hi,
>>>
>>> I'm working on a project where I'm putting Avro records into Kafka and at
>>> the other end pull them out again.
>>> For that purpose I wrote two methods 'toBytes' and 'fromBytes' in a
>>> separate class (see below).
>>>
>>> I see this as the type of problem many developers run into.
>>> Would it be a good idea to generate methods like these into the generated
>>> Java code?
>>>
>>> This would make it possible to serialize and deserialize singles records
>>> like this:
>>>
>>> byte [] someBytes = measurement.toBytes();
>>> Measurement m = Measurement.fromBytes(someBytes);
>>>
>>> Niels Basjes
>>>
>>> P.S. possibly not name it toBytes but getBytes (similar to what the
>>> String
>>> class has)
>>>
>>> public final class MeasurementSerializer {
>>>      private MeasurementSerializer() {
>>>      }
>>>
>>>      public static Measurement fromBytes(byte[] bytes) throws
>>> IOException {
>>>          try {
>>>              DatumReader<Measurement> reader = new
>>> SpecificDatumReader<>(Measurement.getClassSchema());
>>>              Decoder decoder = DecoderFactory.get().binaryDecoder(bytes,
>>> null);
>>>              return reader.read(null, decoder);
>>>          } catch (RuntimeException rex) {
>>>              throw new IOException(rex.getMessage());
>>>          }
>>>      }
>>>
>>>      public static byte[] toBytes(Measurement measurement) throws
>>> IOException {
>>>          try {
>>>              ByteArrayOutputStream out = new ByteArrayOutputStream();
>>>              Encoder encoder = EncoderFactory.get().binaryEncoder(out,
>>> null);
>>>              SpecificDatumWriter<Measurement> writer = new
>>> SpecificDatumWriter<>(Measurement.class);
>>>              writer.write(measurement, encoder);
>>>              encoder.flush();
>>>              out.close();
>>>              return out.toByteArray();
>>>          } catch (RuntimeException rex) {
>>>              throw new IOException(rex.getMessage());
>>>          }
>>>      }
>>> }
>>>
>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to