If you run it as a pure map job, it will do it per split. If you run it as a
single reducer job, it will do it overall. However, one starts to suspect
that by the time you've paid that extra cost, you might as well reconsider
your downstream process and the reason for this subdivision.

S.

On 27 October 2011 23:07, Mapred Learn <[email protected]> wrote:

> Hi Shevek,
> Thanks for the explanation !
>
> Can you point me to some documentatino for specifying size in output format
> ?
>
> If i say size as 200 MB, then after 200 mb, it would do this per split or
> overall ?
> I mena would I end up with 200 mb and a 50 mb from 1st mapper and then, say
> 200 mb and 10 mb from 2nd mapper and so on. Or will I get 200 mb files only
> ?
>
>
>
> On Wed, Oct 26, 2011 at 10:48 AM, Shevek <[email protected]> wrote:
>
> > You can control the input to a computer program, but not (arbitrarily)
> how
> > much output it generates. The only way to generate output files of a
> fixed
> > size is to write a custom output format which shifts to a new filename
> > every
> > time that size is exceeded, but you will still get some small bits left
> > over. The plumbing in this is pretty ugly, and I would not recommend it
> > casually.
> >
> > You may be able to write a second map-only job which reprocesses the
> output
> > from the first job in chunks of X bytes, and just writes them out. Use an
> > IdentityMapper and set the split size. I have not tried this at home.
> >
> > S.
> >
> > On 26 October 2011 07:03, Mapred Learn <[email protected]> wrote:
> >
> > >
> > > >
> > >
> > > > Hi,
> > > > I am trying to create output files of fixed size by using :
> > > > -Dmapred.max.split.size=6442450812 (6 Gb)
> > > >
> > > > But the problem is that the input Data size and metadata varies  and
> I
> > > have to adjust above value manually to achieve fixed size.
> > > >
> > > > Is there a way I can programmatically determine split size that would
> > > yield me fixed sized output files. For eg 200 MB each ?
> > > >
> > > > Thanks,
> > > > JJ
> > >
> >
>

Reply via email to