Hi all, I was wondering if anyone is using Hive with protocol buffers. The Hadoop wiki links to http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook for SerDe examples; there it says that there is no built-in support for protobufs. Since this presentation is about a year old, I was wondering whether there appeared any UDFs, native or third-party, to deal with them.
I am also curious about the relative efficiency of performing SerDe using UDFs in hive vs. running a separate hadoop job to first deserialize the data from protocol buffers into an ascii flat file with only the "interesting" fields (going from ~15 fields to ~3), and then doing the rest of the computation in hive. I'd appreciate any comments! Thanks, --Leo
