Hi Shevek,
Thanks for the explanation !

Can you point me to some documentatino for specifying size in output format
?

If i say size as 200 MB, then after 200 mb, it would do this per split or
overall ?
I mena would I end up with 200 mb and a 50 mb from 1st mapper and then, say
200 mb and 10 mb from 2nd mapper and so on. Or will I get 200 mb files only
?



On Wed, Oct 26, 2011 at 10:48 AM, Shevek <[email protected]> wrote:

> You can control the input to a computer program, but not (arbitrarily) how
> much output it generates. The only way to generate output files of a fixed
> size is to write a custom output format which shifts to a new filename
> every
> time that size is exceeded, but you will still get some small bits left
> over. The plumbing in this is pretty ugly, and I would not recommend it
> casually.
>
> You may be able to write a second map-only job which reprocesses the output
> from the first job in chunks of X bytes, and just writes them out. Use an
> IdentityMapper and set the split size. I have not tried this at home.
>
> S.
>
> On 26 October 2011 07:03, Mapred Learn <[email protected]> wrote:
>
> >
> > >
> >
> > > Hi,
> > > I am trying to create output files of fixed size by using :
> > > -Dmapred.max.split.size=6442450812 (6 Gb)
> > >
> > > But the problem is that the input Data size and metadata varies  and I
> > have to adjust above value manually to achieve fixed size.
> > >
> > > Is there a way I can programmatically determine split size that would
> > yield me fixed sized output files. For eg 200 MB each ?
> > >
> > > Thanks,
> > > JJ
> >
>

Reply via email to