I was not clear enough in my previous email. What I meant is to 'wrap' the application schema in a serialization wrapper schema that has a field indicating the "schema classname". That (generic setup) combined with some generated code in the schema classes should yield a solution that supports schema migration.
Niels On Tue, Dec 22, 2015 at 11:55 AM, Niels Basjes <ni...@basjes.nl> wrote: > Thanks for pointing this out. > This is exactly what I was working on. > > The way I solved the 'does the schema match' question at work is by > requiring that all schema's start with a single text field "schema > classname" being the full class name of the class that was used to generate > it. > That way we can have newer versions of the schema and still be able to > unpack them. In this form the classname is essentially an indicator if > schema migration is possible; even though the schemas are different. > > What do you think of this direction? > > Niels > > > On Mon, Dec 21, 2015 at 11:30 PM, Ryan Blue <b...@cloudera.com> wrote: > >> Niels, >> >> This sounds like a good idea to me to have methods like this. I've had to >> write those methods several times! >> >> The idea is also related to AVRO-1704 [1], which is a suggestion to >> standardize the encoding that is used for single records. Some projects >> have been embedding the schema fingerprint at the start of each record, for >> example, which would be a helpful thing to do. >> >> It may also be a good idea to create a helper object rather than >> attaching new methods to the datum classes themselves. In your example >> below, you have to create a new encoder or decoder for each method call. We >> could instead keep a backing buffer and encoder/decoder on a class that the >> caller instantiates so that they can be reused. At the same time, that >> would make it possible to reuse the class with any data model and manage >> the available schemas (if embedding the fingerprint). >> >> I'm thinking something like this: >> >> ReflectClass datum = new ReflectClass(); >> ReflectData model = ReflectData.get(); >> DatumCodec codec = new DatumCodec(model, schema); >> >> # convert datum to bytes using data model >> byte[] asBytes = codec.toBytes(datum); >> >> # convert bytes to datum using data model >> ReflectClass copy = codec.fromBytes(asBytes); >> >> What do you think? >> >> rb >> >> >> [1]: https://issues.apache.org/jira/browse/AVRO-1704 >> >> >> On 12/18/2015 05:01 AM, Niels Basjes wrote: >> >>> Hi, >>> >>> I'm working on a project where I'm putting Avro records into Kafka and at >>> the other end pull them out again. >>> For that purpose I wrote two methods 'toBytes' and 'fromBytes' in a >>> separate class (see below). >>> >>> I see this as the type of problem many developers run into. >>> Would it be a good idea to generate methods like these into the generated >>> Java code? >>> >>> This would make it possible to serialize and deserialize singles records >>> like this: >>> >>> byte [] someBytes = measurement.toBytes(); >>> Measurement m = Measurement.fromBytes(someBytes); >>> >>> Niels Basjes >>> >>> P.S. possibly not name it toBytes but getBytes (similar to what the >>> String >>> class has) >>> >>> public final class MeasurementSerializer { >>> private MeasurementSerializer() { >>> } >>> >>> public static Measurement fromBytes(byte[] bytes) throws >>> IOException { >>> try { >>> DatumReader<Measurement> reader = new >>> SpecificDatumReader<>(Measurement.getClassSchema()); >>> Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, >>> null); >>> return reader.read(null, decoder); >>> } catch (RuntimeException rex) { >>> throw new IOException(rex.getMessage()); >>> } >>> } >>> >>> public static byte[] toBytes(Measurement measurement) throws >>> IOException { >>> try { >>> ByteArrayOutputStream out = new ByteArrayOutputStream(); >>> Encoder encoder = EncoderFactory.get().binaryEncoder(out, >>> null); >>> SpecificDatumWriter<Measurement> writer = new >>> SpecificDatumWriter<>(Measurement.class); >>> writer.write(measurement, encoder); >>> encoder.flush(); >>> out.close(); >>> return out.toByteArray(); >>> } catch (RuntimeException rex) { >>> throw new IOException(rex.getMessage()); >>> } >>> } >>> } >>> >>> >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Cloudera, Inc. >> > > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes