Where are you trying to specify them? Inside a DoFn? Prior to constructing the MRPipeline?
I'd suggest trying either: 1. Setting those values on the initial Configuration object you pass to the MRPipeline 2. Setting them as Source specific properties[1] on the source itself. The latter approach might be better if you are reading a lot of different sources into your pipeline and don't want to affect them all. [1] - http://crunch.apache.org/apidocs/0.12.0/org/apache/crunch/Source.html#inputConf(java.lang.String,%20java.lang.String) On Fri, Feb 26, 2016 at 5:17 PM, Ben Juhn <[email protected]> wrote: > The data isn’t compressed. The parameters aren’t showing up in the job > configuration either. > > > > On Feb 25, 2016, at 5:15 PM, Ben Juhn <[email protected]> wrote: > > > > Hello there, > > > > I haven’t been able to get crunch to split inputs into multiple > mappers. Currently it’s giving me one mapper per text file, even though > they’re 1GB each. I’ve tried supplying split.maxsize on the command line > and in the DoFn implementation: > > > > @Override > > public void configure(Configuration conf) { > > conf.set("crunch.combine.file.size", "67108864"); > > conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864"); > > conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864"); > > } > > > > Any suggestions? > > > > Thanks, > > Ben > > > >
