Hello,
I am trying to find the best way to write a reducer that selects a single
value for each key it receives.
It seems that bytes array in BytesWritable instances are reutilized by
mapreduce, requiring me to copy the content of the buffer in order to keep
a reference to the data.
Here is the code I came up with, the array copy looks kind of ugly to me,
and I was wondering if there were any best practice to do this?
public static class MyReducer extends Reducer<Text, BytesWritable, Text,
BytesWritable> {
private byte[] copy(byte[] buffer) {
byte[] selected = new byte[buffer.length];
System.arraycopy(buffer, 0, selected, 0, buffer.length);
return selected;
}
@Override
public void reduce(Text key, Iterable<BytesWritable> values,
Context context) throws IOException, InterruptedException {
byte[] selected = null;
for (BytesWritable value : values) {
if (for-some-reason-I-select-this-value) {
selected = copy(value.getBytes());
}
}
context.write(key, new BytesWritable(selected));
}
}
Thanks!
Pierre