Re: Processing splittable inputs

Micah Whitacre Thu, 25 Feb 2016 19:21:07 -0800

Ben,
  Are the text files you are processing compressed?  If so that data
wouldn't be splittable.[1]


[1] -
http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.6.0/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java#57

On Thu, Feb 25, 2016 at 7:15 PM, Ben Juhn <[email protected]> wrote:

> Hello there,
>
> I haven’t been able to get crunch to split inputs into multiple mappers.
> Currently it’s giving me one mapper per text file, even though they’re 1GB
> each.  I’ve tried supplying split.maxsize on the command line and in the
> DoFn implementation:
>
> @Override
> public void configure(Configuration conf) {
> conf.set("crunch.combine.file.size", "67108864");
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864");
> conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864");
> }
>
> Any suggestions?
>
> Thanks,
> Ben
>
>

Re: Processing splittable inputs

Reply via email to