Custom input split

Black, Michael (IS) Fri, 24 Dec 2010 08:35:28 -0800

Using hadoop-0.20


I'm doing custom input splits from a Lucene index.

I want to split the document ID's across N mappers (I'm testing the
scalabilty of the problem across 4 nodes and 8 cores).

So the key is the document# and they are not sequential.

At this point I'm using splits.add to add each document...but that sets up
one task for every document...not something I want to do of course.

How can I add a group of documents to each split?  I found a scant reference
to PrimeInputSplit but that doesn't seem to resolve on hadoop-0.20.


Michael D. Black
Senior Scientist
Nothrop Grumman Information Systems
Advanced Analytics Directorate

smime.p7s
Description: S/MIME cryptographic signature

Custom input split

Reply via email to