Hello,
My RecordReader subclass reads from object X. To parse this object and
emit records, i need the use of a C library and a JNI wrapper.
public boolean next(LongWritable key, BytesWritable value) throws
IOException {
if (leftover == 0) return false;
long wi = pos + split.getStart();
key.set(wi);
value.readFields(X.at( wi);
pos ++; leftover --;
return true;
}
X.at uses the JNI lib to read a record number wi
My question is who running this?
1) For a given job, is one instance of this running on each
tasktracker? reading records and feeding to the mappers on its
machine?
Or,
2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
launched have one RecordReader running feeding records to the maps its
jvm is running.
If it's either (1) or (2), I guess I'm safe from threading issues.
Please correct me if i'm totally wrong.
Regards
Saptarshi Guha