You mean the file is "not trusted". I was using Outlook and my company
automatically puts a digital certificate on all emails. I'm using webmail
right now which doesn't. That certificate is installed by default on all
company computers so it looks trusted to us without having to explicitly trust
the certificate.
I don't think my split problem has anything to do with the Lucene index...that
was just informational.
Here's my getsplits...it calls other functions which aren't important to the
problem at hand...
public List<InputSplit> getSplits(JobContext context) throws
IOException,
InterruptedException {
Configuration conf = context.getConfiguration();
List<InputSplit> splits = new ArrayList<InputSplit>;
Indexer indexer = new Indexer(conf.get(Config.Index), true);
Iterator<Document> iDocument = indexer.iterator();
int ndocs=20; // limt the # of docs for testing -- got over
100,000 of these
while(iDocument.hasNext() && i < 20) {
Document document = iDocument.next();
String docid = document.getId();
System.out.println("Adding ID " + docid);
splits.add(new PInputSplit(docid));
}
indexer.close();
return splits;
}
I assume there's a way to make a specific # of splits and add each document to
the separate splits...but I'll be darned if I can find the docs or an example
to show this.
As I said I'm using hadoop-0.20.2 which I know makes a difference as so many
things get deprecated on each release. Old references don't seem to work.
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
________________________________
From: ?? [mailto:[email protected]]
Sent: Sat 12/25/2010 10:32 AM
To: [email protected]
Subject: EXTERNAL:Re: Custom input split
What is the file you have attached? It is not safe.
I don't know the format of lucene index, would you please give an example?