Hi,
I have a MapReduce job with a map function which parses a line from an
N-Quads file:
private static final Logger log = LoggerFactory.getLogger(FirstMapper.class);
private String inputFileName;
private MapReduceParserProfile profile;
private LabelToNode labelMapping;
public void setup(Context context) throws IOException, InterruptedException {
inputFileName =
context.getConfiguration().get("mapreduce.map.input.file");
Prologue prologue = new Prologue(null, IRIResolver.createNoResolve());
labelMapping = new MapReduceLabelToNode(inputFileName);
profile = new MapReduceParserProfile(prologue,
ErrorHandlerFactory.errorHandlerStd, labelMapping);
}
@Override
public void map (LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if ( log.isDebugEnabled() ) log.debug("< ({}, {})", key, value);
SinkToContext sink = new SinkToContext(context);
Tokenizer tokenizer =
TokenizerFactory.makeTokenizerString(value.toString());
LangNQuads parser = new LangNQuads(tokenizer, profile, sink) ;
parser.parse();
}
(A RecordReader<LongWritable, QuadWritable> would be better, but for now the
snippet above does its job. Almost.)
The problem I have is with blank node labels.
With MapReduce the same file will be split into multiple file splits which
are parsed on different machines. Therefore, I would like to have my own
LabelToNode implementation with an Allocator<String, Node> which takes into
account the filename (or an hash of it) when it creates a new blank node.
Something along these lines:
public Node create(String label) {
return Node.createAnon(new AnonId(filename + "-" + label)) ;
}
So, I have my MapReduceLabelToNode:
public class MapReduceLabelToNode extends LabelToNode {
public MapReduceLabelToNode(String filename) {
super(new SingleScopePolicy(), new MapReduceAllocator(filename));
}
...
But LabelToNode constructor is private.
Could we make it protected?
Or, alternatively, how can I construct a LabelToNode object which will be using
my MapReduceAllocator?
Thanks,
Paolo