Ken Krugler wrote:
3. It would be great to get feedback on both the Avro Cascading scheme (http://github.com/bixolabs/cascading.avro) and the content we're currently saving in the Avro file.

Overall it looks fine to me.

What do you think of https://issues.apache.org/jira/browse/AVRO-513? Would that make your life much easier?

It might be more efficient, instead of reading Avro generic data and converting it to your desired representation, to subclass GenericDatumReader and override #readString(), #readBytes(), #readMap(), and #readArray(). Similarly for DatumWriter. But we'd then also need to permit one to configure AvroRecordReader to use a different DatumReader implementation. We might, e.g., add a DataRepresentationFactory interface:

interface DataRepresentation<T> {
  DatumReader<T> createDatumReader();
  DatumWriter<T> createDatumWriter();
}

Then we could replace AvroJob#setInputSpecific() and #setInputGeneric() with #setInputRepresentation(Class<DataRepresentation> rep, Schema s). You could subclass GenericDatumReader & Writer and implement a DataRepresentation that returns these.

Worth it?

Doug

Reply via email to