Thx, I'll give that a try. Seems to me a method to tell hadoop to split a file every "n" key/value pairs would be logical. Or maybe a createSplitBoundary when appending key/value records?
I just want a way, and not a real complex way, of directing the # of maps and the breakdown of records going to them. Creating a separate file per record group is too slow for my purposes. Lance IBM Software Group - Strategy Performance Architect High-Performance On Demand Solutions (HiPODS) 650-678-8425 cell Doug Cutting <[EMAIL PROTECTED] rg> To hadoop-user@lucene.apache.org 10/18/2007 03:21 cc PM Subject Re: InputFiles, Splits, Maps, Tasks Please respond to Questions 1.3 Base [EMAIL PROTECTED] e.apache.org Lance Amundsen wrote: > There's lots of references on decreasing DFS block size to increase maps to > record ratios. What is the easiest way to do this? Is it possible with > the standard SequenceFile class? You could specify the block size in the Configuration parameter to SequenceFile#createWriter() using the dfs.block.size parameter. But if you simply want to create sub-block-size splits, then increasing the number of map tasks should do that. Doug