Thanks Lohit, i am using only defalult reader and i am very new to hadoop.
This is my map method

      public void map(LongWritable key, Text value, OutputCollector<Text,
Text> output, Reporter reporter) throws IOException {  
        String line = value.toString();  
        StringTokenizer tokenizer = new StringTokenizer(line);  
        while (tokenizer.hasMoreTokens()) {
                
                String val = tokenizer.nextToken();
                try {
                        
                if (val != null && val.contains("the")) {
                        word.set(line);
                        FileSplit spl = (FileSplit)reporter.getInputSplit();
                        output.collect(word, new Text(spl.getPath().getName()));
                }
                } catch (Exception e) {
                        System.out.println(e);
                }
        }
      }
    }

I have a pdf file in my dfs input folder. can you tell me what i have to do
to read pdf files?

Thanks
Ganesh.G


lohit-2 wrote:
> 
> Can you provide more information. How are you passing your input, are you
> passing raw pdf files? If so, are you using your own record reader.
> Default record reader wont read pdf files and you wont get the text out of
> it as is. 
> Thanks,
> Lohit
> 
> 
> 
> ----- Original Message ----
> From: GaneshG <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, July 23, 2008 1:51:52 AM
> Subject: Text search on a PDF file using hadoop
> 
> 
> while i search a text in a pdf file using hadoop, the results are not
> coming
> properly. i tried to debug my program, i could see the lines red from pdf
> file is not formatted. please help me to resolve this.
> -- 
> View this message in context:
> http://www.nabble.com/Text-search-on-a-PDF-file-using-hadoop-tp18606475p18606475.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-Text-search-on-a-PDF-file-using-hadoop-tp18606558p18606703.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to