I entered your code inside a main. I have imported libraries required by mistake but me. First error:
parser.parse(); Syntax error on token "parse", Identifier expected after this token Second error: cd.close(); Syntax error on token "close", Identifier expected after this token Third error: doc.add(new Field("content", text,Field.Store.NO <http:\field.store.no>, Field.Index.TOKENIZED)); Multiple markers at this line - Syntax error on tokens, delete these tokens - Syntax error on token "add", = expected after this token - Syntax error on token ",", delete this token - Syntax error on token(s), misplaced construct(s) > Date: Thu, 4 Dec 2008 19:17:01 +0530> From: [EMAIL PROTECTED]> To: > java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?> > Hi Tiziano,> > > What is the error you got? I think you can get the text easily using the> > code shown below.> > > FileInputStream fi = new FileInputStream(new > File("sample.pdf"));> > PDFParser parser = new PDFParser(fi);> > parser.parse();> COSDocument cd = parser.getDocument();> PDFTextStripper > stripper = new PDFTextStripper();> String text = stripper.getText(new > PDDocument(cd));> cd.close();> > After getting the value for text you can > simply create the Lucene document.> > Document doc = new Document();> > doc.add(new Field("content", text,Field.Store.NO <http://field.store.no/>,> > Field.Index.TOKENIZED));> >> >> >> >> > On Thu, Dec 4, 2008 at 6:20 PM, > tiziano bernardi <[EMAIL PROTECTED]>wrote:> >> >>> >> Thanks very kind ...> > >> But I've tried that code but I do not work ...> >> You could send me a > simple working class that uses it please?> >> Thanks> Date: Thu, 4 Dec 2008 > 15:19:26 +0530> From: [EMAIL PROTECTED]>> >> To: java-user@lucene.apache.org> > Subject: Re: Pdf in Lucene?> > Hi,> > In> >> my case I used PDFBox, just to > extract the text from PDF document and> then> >> I created the Lucene > document giving the extracted text. (I didn't use> the> >> PDFBox built in > Lucene search engine). So I didn't get any> incompatibility> >> problems.> > > This blog post shows the way.>> >> > http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html>> > >> > It worked perfect for me.> > Thanks.> >> > _________________________________________________________________> >> Ci sai > fare con l'italiano? Scoprilo con Typectionary!> >> > http://typectionary.it.msn.com/> >>> >> >> >> > --> > Kalani Ruwanpathirana> > > Department of Computer Science & Engineering> > University of Moratuwa> >> > > > > -- > Kalani Ruwanpathirana> Department of Computer Science & > Engineering> University of Moratuwa _________________________________________________________________ Fanne di tutti i colori, personalizza la tua Hotmail! http://imagine-windowslive.com/Hotmail/#0