Ken Krugler wrote:
3. It would be great to get feedback on both the Avro Cascading scheme
(http://github.com/bixolabs/cascading.avro) and the content we're
currently saving in the Avro file.
Overall it looks fine to me.
What do you think of https://issues.apache.org/jira/browse/AVRO-513?
Would that make your life much easier?
It might be more efficient, instead of reading Avro generic data and
converting it to your desired representation, to subclass
GenericDatumReader and override #readString(), #readBytes(), #readMap(),
and #readArray(). Similarly for DatumWriter. But we'd then also need
to permit one to configure AvroRecordReader to use a different
DatumReader implementation. We might, e.g., add a
DataRepresentationFactory interface:
interface DataRepresentation<T> {
DatumReader<T> createDatumReader();
DatumWriter<T> createDatumWriter();
}
Then we could replace AvroJob#setInputSpecific() and #setInputGeneric()
with #setInputRepresentation(Class<DataRepresentation> rep, Schema s).
You could subclass GenericDatumReader & Writer and implement a
DataRepresentation that returns these.
Worth it?
Doug