Using hadoop-0.20

I'm doing custom input splits from a Lucene index.

I want to split the document ID's across N mappers (I'm testing the
scalabilty of the problem across 4 nodes and 8 cores).

So the key is the document# and they are not sequential.

At this point I'm using splits.add to add each document...but that sets up
one task for every document...not something I want to do of course.

How can I add a group of documents to each split?  I found a scant reference
to PrimeInputSplit but that doesn't seem to resolve on hadoop-0.20.


Michael D. Black
Senior Scientist
Nothrop Grumman Information Systems
Advanced Analytics Directorate



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to