The data isn’t compressed. The parameters aren’t showing up in the job configuration either.
> On Feb 25, 2016, at 5:15 PM, Ben Juhn <[email protected]> wrote: > > Hello there, > > I haven’t been able to get crunch to split inputs into multiple mappers. > Currently it’s giving me one mapper per text file, even though they’re 1GB > each. I’ve tried supplying split.maxsize on the command line and in the DoFn > implementation: > > @Override > public void configure(Configuration conf) { > conf.set("crunch.combine.file.size", "67108864"); > conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864"); > conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864"); > } > > Any suggestions? > > Thanks, > Ben >
