Hi,

On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS)
<[email protected]> wrote:
> I assume there's a way to make a specific # of splits and add each document 
> to the separate splits...but I'll be darned if I can find the docs or an 
> example to show this.

Would CombineFileInputFormat and CombineFileSplit be what you're looking for?

Doc links: 
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
& 
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/CombineFileSplit.html

> As I said I'm using hadoop-0.20.2 which I know makes a difference as so many 
> things get deprecated on each release.  Old references don't seem to work.

The API marked deprecated in 0.20.{0,1,2} has been un-deprecated in
the 0.21.0 release  and is also considered as the "stable" API. You
can continue using it, as it is still supported.

(Maybe 0.20.3 will have them un-deprecated too, I'm not sure what's
the status on that, although doing so would surely help avoid beginner
confusion.)
-- 
Harsh J
www.harshj.com

Reply via email to