Hi Shevek, Thanks for the explanation ! Can you point me to some documentatino for specifying size in output format ?
If i say size as 200 MB, then after 200 mb, it would do this per split or overall ? I mena would I end up with 200 mb and a 50 mb from 1st mapper and then, say 200 mb and 10 mb from 2nd mapper and so on. Or will I get 200 mb files only ? On Wed, Oct 26, 2011 at 10:48 AM, Shevek <[email protected]> wrote: > You can control the input to a computer program, but not (arbitrarily) how > much output it generates. The only way to generate output files of a fixed > size is to write a custom output format which shifts to a new filename > every > time that size is exceeded, but you will still get some small bits left > over. The plumbing in this is pretty ugly, and I would not recommend it > casually. > > You may be able to write a second map-only job which reprocesses the output > from the first job in chunks of X bytes, and just writes them out. Use an > IdentityMapper and set the split size. I have not tried this at home. > > S. > > On 26 October 2011 07:03, Mapred Learn <[email protected]> wrote: > > > > > > > > > > > Hi, > > > I am trying to create output files of fixed size by using : > > > -Dmapred.max.split.size=6442450812 (6 Gb) > > > > > > But the problem is that the input Data size and metadata varies and I > > have to adjust above value manually to achieve fixed size. > > > > > > Is there a way I can programmatically determine split size that would > > yield me fixed sized output files. For eg 200 MB each ? > > > > > > Thanks, > > > JJ > > >
