Re: Working with Avro Generic Records in the interactive scala shell

2014-05-27 Thread Jeremy Lewi
I was able to work around this by switching to the SpecificDatum interface and following this example: https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SerializableAminoAcid.java As in the example, I defined a subclass of my Avro type which implemented the

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-27 Thread Jeremy Lewi
Thanks that's super helpful. J On Tue, May 27, 2014 at 8:01 AM, Matt Massie mas...@berkeley.edu wrote: I really should update that blog post. I created a gist (see https://gist.github.com/massie/7224868) which explains a cleaner, more efficient approach. -- Matt

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-27 Thread Andrew Ash
Also see this context from February. We started working with Chill to get Avro records automatically registered with Kryo. I'm not sure the final status, but from the Chill PR #172 it looks like this might be much less friction than before. Issue we filed:

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-24 Thread Josh Marcus
Jeremy, Just to be clear, are you assembling a jar with that class compiled (with its dependencies) and including the path to that jar on the command line in an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)? --j On Saturday, May 24, 2014, Jeremy Lewi jer...@lewi.us wrote: Hi

Re: Working with Avro Generic Records in the interactive scala shell

2014-05-24 Thread Jeremy Lewi
Hi Josh, Thanks for the help. The class should be on the path on all nodes. Here's what I did: 1) I built a jar from my scala code. 2) I copied that jar to a location on all nodes in my cluster (/usr/local/spark) 3) I edited bin/compute-classpath.sh to add my jar to the class path. 4) I

Working with Avro Generic Records in the interactive scala shell

2014-05-23 Thread Jeremy Lewi
Hi Spark Users, I'm trying to read and process an Avro dataset using the interactive spark scala shell. When my pipeline executes I get the ClassNotFoundException pasted at the end of this email. I'm trying to use the Generic Avro API (not the Specific API). Here's a gist of the commands I'm