Avro can be stored as file as well as serialization for rpc. Doug Cutting just gave a presentation at HUG last Wednesday about this, and Avro trunk has mapreduce code which output avro file format.
Regards, Eric On 7/26/10 5:48 PM, "Jerome Boulon" <jbou...@netflix.com> wrote: > Hi Eric, > Can you clarify what ³Avro format² means? > From my understanding Avro is a serialization format not a file format. > > So, are you thinking of a new file format like Tfile/SeqFile/RCFile? > If yes can you give a pointer to that new file format? > > Thanks, > /Jerome. > > > On 7/26/10 4:06 PM, "Eric Yang" <ey...@yahoo-inc.com> wrote: > >> >> >>> > 1) How will Avro be used with Chukwa? >> >> Avro can improve flexibility for both ChukwaArchiveKey and ChukwaRecord. >> The current key representation is optimized for time series use case only. >> Ideally, having more dynamic key meta data also helps channeling of user¹s >> data. >> >>> > 2) Does all Chukwa files be in Avro format? >> >> Chukwa files are currently in sequence file format. It will convert to >> Avro, if the community vote to do so. I am using Hbase as my storage sink, >> hence, my use case doesn¹t apply. >> >>> > 3) Are there any plans to enhance Chukwa record format? >> >> I haven¹t give much thought about this. Ideally, ChukwaArchiveKey is avro >> object with a reserved metadata field, and Record is byte[] which = avro >> object. We will implemnt a comparator to compare times series plus a couple >> dimensions. >> >> If Chukwa converts to avro, then you get your use case for free. However, I >> am not sure who will be writing the implementation. If you are interested >> in writing this, you are welcome to contribute. >> >>> > I have written Adapter and Parser for Multiline Record format. If Chukwa >>> will >>> > be using Avro format then I also have to change my code. >>> > Currently I am processing the log files in Chukwa and converting them to >>> Avro >>> > format to keep it in HDFS. If you are planning to include the Avro in the >>> > Chukwa then does it mean that all the Chukwa files will be in Avro format ? >> >> My data will be in avro in Hbase, and the data is also mirrored to live on >> in sequence file as String or bytes for the short term. In the long run, >> when someone has implemented a more superior format than sequence file and >> Tfile, then Chukwa community may be interested to move. This is currently >> not the top priority. The performance of plain avro file on hdfs should be >> faster than sequence file, but we are waiting for Avro 1.4 to age a little >> bit longer before making the jump. >> >> Regards, >> Eric >> >>> > Please Suggest >>> > Stuti >>> > >>> > >>> > From: Eric Yang [mailto:ey...@yahoo-inc.com] >>> > Sent: Friday, July 23, 2010 10:06 PM >>> > To: chukwa-user@hadoop.apache.org >>> > Cc: Jerome Boulon >>> > Subject: Re: Why ChukwaRecord have only Key:Value as Strings >>> > >>> > Initially, ChukwaRecord only supports String because it was made to >>> process >>> > text log file. We were naïve to think that we can use JSON for all our >>> data. >>> > There is a plan to use Avro instead of supporting generic types when Avro >>> > mapreduce input/output format is ready next month. This provides better >>> meta >>> > data support inside the data for the processing system. >>> > >>> > Regards, >>> > Eric >>> > >>> > On 7/23/10 5:03 AM, "Stuti Awasthi" <stuti_awas...@persistent.co.in> >>> wrote: >>> > Hi all, >>> > >>> > I was looking at the code of ChukwaRecord and found out that it adds only >>> > <String key & String value > >>> > >>> > Snippet : >>> > >>> > Public class ChukwaRecord extends ChukwaRecordJT implements Record >>> > { >>> > Public void add( String Key, String Value ) >>> > } >>> > >>> > I have a scenario in which I want to add the Object as a value i.e <String >>> Key >>> > ,Object value> . >>> > >>> > Does chukwa¹s current implementation support that or any patch available? >>> > >>> > Stuti >>> > >>> > >>> > Thanks and Regards >>> > >>> > Stuti Awasthi | Software Engineer IBM BU | Persistent Systems Limited >>> > stuti_awas...@persistent.co.in <mailto:chandan_avd...@persistent.co.in> | >>> > Tel: +91 (20) 391 77837 >>> > >>> > >>> > DISCLAIMER ========== This e-mail may contain privileged and confidential >>> > information which is the property of Persistent Systems Ltd. It is >>> intended >>> > only for the use of the individual or entity to which it is addressed. If >>> you >>> > are not the intended recipient, you are not authorized to read, retain, >>> copy, >>> > print, distribute or use this message. If you have received this >>> communication >>> > in error, please notify the sender and delete all copies of this message. >>> > Persistent Systems Ltd. does not accept any liability for virus infected >>> > mails. >>> > DISCLAIMER ========== This e-mail may contain privileged and confidential >>> > information which is the property of Persistent Systems Ltd. It is >>> intended >>> > only for the use of the individual or entity to which it is addressed. If >>> you >>> > are not the intended recipient, you are not authorized to read, retain, >>> copy, >>> > print, distribute or use this message. If you have received this >>> communication >>> > in error, please notify the sender and delete all copies of this message. >>> > Persistent Systems Ltd. does not accept any liability for virus infected >>> > mails. >>> > >> >> >> >