Mapred, This should be doable if you are using TextInputFormat (or other FileInputFormat derivatives that do not override getSplits() behaviors).
Try this: jobConf.setLong("mapred.min.split.size", <byte size you want each mapper split to try to contain, i.e. 1 GB in bytes (long)>); This would get you splits worth the size you mention, 1 GB or else, and you should have outputs fairly near to 1 GB when you do the sequence file conversion (lower at times due to serialization and compression being applied). You can play around with the parameter until the results are satisfactory. Note: Tasks would no longer be perfectly data local since you're requesting much > block size perhaps. On Wed, Jun 22, 2011 at 10:52 PM, Mapred Learn <mapred.le...@gmail.com> wrote: > I have a use case where I want to process data and generate seq file output > of fixed size , say 1 GB i.e. each map-reduce job output should be 1 Gb. > > Does anybody know of any -D option or any other way to achieve this ? > > -Thanks JJ -- Harsh J