A location on HADOOP_CLASSPATH available across all nodes would be the best thing, if you aren't going with adding it to distributed caches every time you need such a job.
Note that you'll have to restart your TTs to get their classpaths updated. On Thu, Aug 18, 2011 at 2:24 AM, W.P. McNeill <[email protected]> wrote: > Please disregard my earlier email. I accidentally sent it before I was done > writing. > > I am working with some data that has a custom IO serialization. So I've > added a MySerialization class to the io.serializations property of > mapred-site.xml. > > <property> > <name>io.serializations</name> > <value>MySerialization,org.apache.hadoop.io.serializer.WritableSerialization</value> > </property> > > If I write a Hadoop job that uses this data type, I just make sure to > include MySerialization in it. However, say I want to make standard Hadoop > jobs like Fs or Streaming also understand this serialization type. I have to > put it in a JAR that the Hadoop framework can see. What is the best way to > do this? I've been adding it to the HADOOP_CLASSPATH in hadoop-env.sh. > -- Harsh J
