Hello there,
I haven’t been able to get crunch to split inputs into multiple mappers.
Currently it’s giving me one mapper per text file, even though they’re 1GB
each. I’ve tried supplying split.maxsize on the command line and in the DoFn
implementation:
@Override
public void configure(Configuration conf) {
conf.set("crunch.combine.file.size", "67108864");
conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864");
conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864");
}
Any suggestions?
Thanks,
Ben