Ben, Are the text files you are processing compressed? If so that data wouldn't be splittable.[1]
[1] - http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.6.0/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java#57 On Thu, Feb 25, 2016 at 7:15 PM, Ben Juhn <[email protected]> wrote: > Hello there, > > I haven’t been able to get crunch to split inputs into multiple mappers. > Currently it’s giving me one mapper per text file, even though they’re 1GB > each. I’ve tried supplying split.maxsize on the command line and in the > DoFn implementation: > > @Override > public void configure(Configuration conf) { > conf.set("crunch.combine.file.size", "67108864"); > conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864"); > conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864"); > } > > Any suggestions? > > Thanks, > Ben > >
