That sounds like a bug to me. I think the easiest way would be to modify InputSampler to handle non Text keys.
-Joey On Wed, May 18, 2011 at 4:24 PM, W.P. McNeill <[email protected]> wrote: > I want to do a total sort on some data whose key type is Writable but not > Text. I wrote an InputSampler.RandomSampler object following the example in > the "Total Sort" section of *Hadoop: The Definitive Guide*. When I > call InputSampler.writePartitionFile() I get a class cast exception because > my key type cannot be cast to Text. Specifically the issue seems to be the > following section of InputSampler.getSample(): > > K key = reader.getCurrentKey(); > .... > Text keyCopy = WritableUtils.<Text>clone((Text)key, > job.getConfiguration()); > > From this source it does appear that you can only use a RandomSampler on > data with Text keys. However, I'm confused because I don't see this > mentioned in any documentation, and I assume this wouldn't be the case > because InputSampler takes <Key, Value> generic specifications. > > 1. Does InputSampler.RandomSampler only work on data with Text key > values? > 2. If so, what is the easiest way to generate a random sample for data > with non-Text key values? Is there example code anywhere? > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
