Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base

Doug Cutting Thu, 18 Oct 2007 16:05:12 -0700

Lance Amundsen wrote:

Thx, I'll give that a try.   Seems to me a method to tell hadoop to split a
file every "n" key/value pairs would be logical.  Or maybe a
createSplitBoundary when appending key/value records?

Splits should not require examining the data: that's not scalable. Sothey're instead on arbitrary byte boundaries.

I just want a way, and not a real complex way, of directing the # of maps
and the breakdown of records going to them.  Creating a separate file per
record group is too slow for my purposes.

Just set the number of map tasks. That should mostly do what you wantin this case. If you want finer-grained control, implement your ownInputFormat.


Doug

Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base

Reply via email to