Lance Amundsen wrote:
Thx, I'll give that a try.   Seems to me a method to tell hadoop to split a
file every "n" key/value pairs would be logical.  Or maybe a
createSplitBoundary when appending key/value records?

Splits should not require examining the data: that's not scalable. So they're instead on arbitrary byte boundaries.

I just want a way, and not a real complex way, of directing the # of maps
and the breakdown of records going to them.  Creating a separate file per
record group is too slow for my purposes.

Just set the number of map tasks. That should mostly do what you want in this case. If you want finer-grained control, implement your own InputFormat.

Doug

Reply via email to