Thx, I'll give that a try. Seems to me a method to tell hadoop to split a
file every "n" key/value pairs would be logical. Or maybe a
createSplitBoundary when appending key/value records?
I just want a way, and not a real complex way, of directing the # of maps
and the breakdown of records going to them. Creating a separate file per
record group is too slow for my purposes.
Lance
IBM Software Group - Strategy
Performance Architect
High-Performance On Demand Solutions (HiPODS)
650-678-8425 cell
Doug Cutting
<[EMAIL PROTECTED]
rg> To
[email protected]
10/18/2007 03:21 cc
PM
Subject
Re: InputFiles, Splits, Maps, Tasks
Please respond to Questions 1.3 Base
[EMAIL PROTECTED]
e.apache.org
Lance Amundsen wrote:
> There's lots of references on decreasing DFS block size to increase maps
to
> record ratios. What is the easiest way to do this? Is it possible with
> the standard SequenceFile class?
You could specify the block size in the Configuration parameter to
SequenceFile#createWriter() using the dfs.block.size parameter. But if
you simply want to create sub-block-size splits, then increasing the
number of map tasks should do that.
Doug