Including a schema fingerprint at the start 1) reuses stuff we have 2) gives a language independent notion of compatibility 3) doesn't bind how folks get stuff in/out of the single record form.
-- Sean Busbey On Dec 22, 2015 06:52, "Niels Basjes" <ni...@basjes.nl> wrote: > I was not clear enough in my previous email. > What I meant is to 'wrap' the application schema in a serialization wrapper > schema that has a field indicating the "schema classname". > That (generic setup) combined with some generated code in the schema > classes should yield a solution that supports schema migration. > > Niels > > On Tue, Dec 22, 2015 at 11:55 AM, Niels Basjes <ni...@basjes.nl> wrote: > > > Thanks for pointing this out. > > This is exactly what I was working on. > > > > The way I solved the 'does the schema match' question at work is by > > requiring that all schema's start with a single text field "schema > > classname" being the full class name of the class that was used to > generate > > it. > > That way we can have newer versions of the schema and still be able to > > unpack them. In this form the classname is essentially an indicator if > > schema migration is possible; even though the schemas are different. > > > > What do you think of this direction? > > > > Niels > > > > > > On Mon, Dec 21, 2015 at 11:30 PM, Ryan Blue <b...@cloudera.com> wrote: > > > >> Niels, > >> > >> This sounds like a good idea to me to have methods like this. I've had > to > >> write those methods several times! > >> > >> The idea is also related to AVRO-1704 [1], which is a suggestion to > >> standardize the encoding that is used for single records. Some projects > >> have been embedding the schema fingerprint at the start of each record, > for > >> example, which would be a helpful thing to do. > >> > >> It may also be a good idea to create a helper object rather than > >> attaching new methods to the datum classes themselves. In your example > >> below, you have to create a new encoder or decoder for each method > call. We > >> could instead keep a backing buffer and encoder/decoder on a class that > the > >> caller instantiates so that they can be reused. At the same time, that > >> would make it possible to reuse the class with any data model and manage > >> the available schemas (if embedding the fingerprint). > >> > >> I'm thinking something like this: > >> > >> ReflectClass datum = new ReflectClass(); > >> ReflectData model = ReflectData.get(); > >> DatumCodec codec = new DatumCodec(model, schema); > >> > >> # convert datum to bytes using data model > >> byte[] asBytes = codec.toBytes(datum); > >> > >> # convert bytes to datum using data model > >> ReflectClass copy = codec.fromBytes(asBytes); > >> > >> What do you think? > >> > >> rb > >> > >> > >> [1]: https://issues.apache.org/jira/browse/AVRO-1704 > >> > >> > >> On 12/18/2015 05:01 AM, Niels Basjes wrote: > >> > >>> Hi, > >>> > >>> I'm working on a project where I'm putting Avro records into Kafka and > at > >>> the other end pull them out again. > >>> For that purpose I wrote two methods 'toBytes' and 'fromBytes' in a > >>> separate class (see below). > >>> > >>> I see this as the type of problem many developers run into. > >>> Would it be a good idea to generate methods like these into the > generated > >>> Java code? > >>> > >>> This would make it possible to serialize and deserialize singles > records > >>> like this: > >>> > >>> byte [] someBytes = measurement.toBytes(); > >>> Measurement m = Measurement.fromBytes(someBytes); > >>> > >>> Niels Basjes > >>> > >>> P.S. possibly not name it toBytes but getBytes (similar to what the > >>> String > >>> class has) > >>> > >>> public final class MeasurementSerializer { > >>> private MeasurementSerializer() { > >>> } > >>> > >>> public static Measurement fromBytes(byte[] bytes) throws > >>> IOException { > >>> try { > >>> DatumReader<Measurement> reader = new > >>> SpecificDatumReader<>(Measurement.getClassSchema()); > >>> Decoder decoder = > DecoderFactory.get().binaryDecoder(bytes, > >>> null); > >>> return reader.read(null, decoder); > >>> } catch (RuntimeException rex) { > >>> throw new IOException(rex.getMessage()); > >>> } > >>> } > >>> > >>> public static byte[] toBytes(Measurement measurement) throws > >>> IOException { > >>> try { > >>> ByteArrayOutputStream out = new ByteArrayOutputStream(); > >>> Encoder encoder = EncoderFactory.get().binaryEncoder(out, > >>> null); > >>> SpecificDatumWriter<Measurement> writer = new > >>> SpecificDatumWriter<>(Measurement.class); > >>> writer.write(measurement, encoder); > >>> encoder.flush(); > >>> out.close(); > >>> return out.toByteArray(); > >>> } catch (RuntimeException rex) { > >>> throw new IOException(rex.getMessage()); > >>> } > >>> } > >>> } > >>> > >>> > >>> > >>> > >> > >> -- > >> Ryan Blue > >> Software Engineer > >> Cloudera, Inc. > >> > > > > > > > > -- > > Best regards / Met vriendelijke groeten, > > > > Niels Basjes > > > > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >