That surprises me-- Crunch has its own AvroOutputFormat in order to use the
mapreduce.* APIs, but they delegate much of the work to things like
DatumWriters/encoders/etc. from Avro's core libraries.

Could I get some detail on hadoop/avro version? Is it just 1.0.x and Avro
1.7.0?

J


On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins <[email protected]>wrote:

> Out of curiosity, is there a way to write output from a Crunch pipeline
> into an Avro-format file? It seems that if you do the
> collection.write(To.avroFile(path)), you end up just writing JSON. It can
> certainly be read into an Avro object, but it seems like it would be more
> efficient to write binary data to the file, so no parsing has to happen.
>
> Have I missed an API, or is this a missing feature?
>
> Thanks,
> Natty
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Reply via email to