On Wed, 21 Apr 2010, Doug Cutting wrote: > R. Tyler Ballance wrote: > >Is hadoop streaming support actually /working/ in trunk? > > Hadoop Streaming access to Avro data? No. Hadoop Streaming is > primarily intended for textual, CSV-style data. > > To better integrate languages Avro data into Perl, Python and Ruby > mapreduce programs, we hope to builds something like Hadoop Pipes. > > https://issues.apache.org/jira/browse/AVRO-512 > > I hope to work on this in the coming weeks.
Ah, this rings a bit clearer to me, mind you I'm a hadidiot, I'm more into generating the avro datas (and the RPC!). I'll follow the ticket, looking forward to seeing that going in. > > AVRO-493 only provides Avro data to Java mapreduce programs. The > best documentation for it currently are its unit test source code. > > http://tinyurl.com/yz8bd22 > http://tinyurl.com/2a3xbu8 Handy links, I don't think we're going to invest any time in writing anything other than Python code for the time being. Until you have the chance to crank through #512, our intermediary solution has been to pre-process avro logs, pulling out the schema into a separate file and dumping it to a textual JSON file suitable for streaming into hadoop. Cheers, -R. Tyler Ballance -------------------------------------- Jabber: [email protected] GitHub: http://github.com/rtyler Identica: http://identi.ca/dero Twitter: http://twitter.com/agentdero Blog: http://unethicalblogger.com
pgpMRYNZ5o25v.pgp
Description: PGP signature
