Re: Recommendations for a schema-based data language for use in Hadoop?

Marshall Bockrath-Vandegrift Wed, 05 Aug 2015 10:28:16 -0700

Ryan Schmitt <rschm...@u.rochester.edu> writes:

> I'm currently working on some problems in the big data space, and I'm
> more or less starting from scratch with the Hadoop ecosystem. I was
> looking at ways to work with data in Hadoop, and I realized that
> (because of how InputFormat splitting works) this is a use case where
> it's actually pretty important to use a data language with an external
> schema.


At Damballa we extensively use Avro for these sorts of problems.  We’ve
written a set of Clojure bindings for Avro named “abracad” [1].  Abracad
exposes Avro data as native Clojure data (persistent vectors, maps,
etc), supports protocol-based de/serialization of custom types, and
includes explicit support for defining “EDN-in-Avro” schemas which can
include arbitrary Clojure data.

We’ve implemented support in the mainline Java Avro project (merged in
1.7.5) for specifying configurable “data models” for MapReduce jobs,
which allows Avro MapReduce input to directly produce Clojure data and
output to consume Clojure data.  And we’ve implemented fairly automatic
configuration for such in the Avro dseqs of our “parkour”
Clojure-Hadoop/MR integration library [2].

[1] https://github.com/damballa/abracad

[2] https://github.com/damballa/parkour

-- 
Marshall Bockrath-Vandegrift <llas...@damballa.com>
Principal Software Engineer, Damballa R&D

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations for a schema-based data language for use in Hadoop?

Reply via email to