I want to do a total sort on some data whose key type is Writable but not
Text. I wrote an InputSampler.RandomSampler object following the example in
the "Total Sort" section of *Hadoop: The Definitive Guide*. When I
call InputSampler.writePartitionFile() I get a class cast exception because
my key type cannot be cast to Text. Specifically the issue seems to be the
following section of InputSampler.getSample():
K key = reader.getCurrentKey();
....
Text keyCopy = WritableUtils.<Text>clone((Text)key,
job.getConfiguration());
>From this source it does appear that you can only use a RandomSampler on
data with Text keys. However, I'm confused because I don't see this
mentioned in any documentation, and I assume this wouldn't be the case
because InputSampler takes <Key, Value> generic specifications.
1. Does InputSampler.RandomSampler only work on data with Text key
values?
2. If so, what is the easiest way to generate a random sample for data
with non-Text key values? Is there example code anywhere?