Re: Custom input split

Black, Michael (IS) Sun, 26 Dec 2010 05:00:06 -0800

You mean the file is "not trusted".  I was using Outlook and my company 
automatically puts a digital certificate on all emails.   I'm using webmail 
right now which doesn't.  That certificate is installed by default on all 
company computers so it looks trusted to us without having to explicitly trust 
the certificate.
 
I don't think my split problem has anything to do with the Lucene index...that 
was just informational.
 
Here's my getsplits...it calls other functions which aren't important to the 
problem at hand...
 
        public List<InputSplit> getSplits(JobContext context) throws 
IOException,
                        InterruptedException {
                Configuration conf = context.getConfiguration();
                List<InputSplit> splits = new ArrayList<InputSplit>;
                Indexer indexer = new Indexer(conf.get(Config.Index), true);
                Iterator<Document> iDocument = indexer.iterator();
                int ndocs=20; // limt the # of docs for testing -- got over 
100,000 of these
                while(iDocument.hasNext() && i < 20) {
                        Document document = iDocument.next();
                        String docid = document.getId();
                        System.out.println("Adding ID  " + docid);
                        splits.add(new PInputSplit(docid));
                }
                indexer.close();
                return splits;
        }


I assume there's a way to make a specific # of splits and add each document to 
the separate splits...but I'll be darned if I can find the docs or an example 
to show this.
 
As I said I'm using hadoop-0.20.2 which I know makes a difference as so many 
things get deprecated on each release.  Old references don't seem to work.
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 

________________________________

From: ?? [mailto:[email protected]]
Sent: Sat 12/25/2010 10:32 AM
To: [email protected]
Subject: EXTERNAL:Re: Custom input split



What is the file you have attached? It is not safe.

I don't know the format of lucene index, would you please give an example?

Re: Custom input split

Reply via email to