Ryan Schmitt <rschm...@u.rochester.edu> writes: > I'm currently working on some problems in the big data space, and I'm > more or less starting from scratch with the Hadoop ecosystem. I was > looking at ways to work with data in Hadoop, and I realized that > (because of how InputFormat splitting works) this is a use case where > it's actually pretty important to use a data language with an external > schema.
At Damballa we extensively use Avro for these sorts of problems. We’ve written a set of Clojure bindings for Avro named “abracad” [1]. Abracad exposes Avro data as native Clojure data (persistent vectors, maps, etc), supports protocol-based de/serialization of custom types, and includes explicit support for defining “EDN-in-Avro” schemas which can include arbitrary Clojure data. We’ve implemented support in the mainline Java Avro project (merged in 1.7.5) for specifying configurable “data models” for MapReduce jobs, which allows Avro MapReduce input to directly produce Clojure data and output to consume Clojure data. And we’ve implemented fairly automatic configuration for such in the Avro dseqs of our “parkour” Clojure-Hadoop/MR integration library [2]. [1] https://github.com/damballa/abracad [2] https://github.com/damballa/parkour -- Marshall Bockrath-Vandegrift <llas...@damballa.com> Principal Software Engineer, Damballa R&D -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.