Using hadoop-0.20
I'm doing custom input splits from a Lucene index. I want to split the document ID's across N mappers (I'm testing the scalabilty of the problem across 4 nodes and 8 cores). So the key is the document# and they are not sequential. At this point I'm using splits.add to add each document...but that sets up one task for every document...not something I want to do of course. How can I add a group of documents to each split? I found a scant reference to PrimeInputSplit but that doesn't seem to resolve on hadoop-0.20. Michael D. Black Senior Scientist Nothrop Grumman Information Systems Advanced Analytics Directorate
smime.p7s
Description: S/MIME cryptographic signature
