Re: Pdf in Lucene?

Grant Ingersoll Sat, 06 Dec 2008 18:33:59 -0800


On Dec 4, 2008, at 9:50 AM, tiziano bernardi wrote:


I entered your code inside a main.
I have imported libraries required by mistake but me.
First error:

parser.parse();

Syntax error on token "parse", Identifier expected after this token
Second error:
cd.close();
Syntax error on token "close", Identifier expected after this token
Third error:

doc.add(new Field("content", text,Field.Store.NO <http:\field.store.no>,

Field.Index.TOKENIZED));

What is the <http:\field.store.no> stuff? Seems like you've got a lotof junk in your file that isn't proper Java.

Multiple markers at this line
- Syntax error on tokens, delete these tokens
- Syntax error on token "add", = expected after this token
- Syntax error on token ",", delete this token
- Syntax error on token(s), misplaced construct(s)
Date: Thu, 4 Dec 2008 19:17:01 +0530> From: [EMAIL PROTECTED]> To: [email protected]> Subject: Re: Pdf in Lucene?> > Hi Tiziano,> > What is the erroryou got? I think you can get the text easily using the> code shownbelow.> > > FileInputStream fi = new FileInputStream(newFile("sample.pdf"));> > PDFParser parser = new PDFParser(fi);>parser.parse();> COSDocument cd = parser.getDocument();>PDFTextStripper stripper = new PDFTextStripper();> String text =stripper.getText(new PDDocument(cd));> cd.close();> > After gettingthe value for text you can simply create the Lucene document.> >Document doc = new Document();> doc.add(new Field("content",text,Field.Store.NO <http://field.store.no/>,>Field.Index.TOKENIZED));> >> >> >> >> > On Thu, Dec 4, 2008 at 6:20PM, tiziano bernardi <[EMAIL PROTECTED]>wrote:> >> >>> >> Thanksvery kind ...> >> But I've tried that code but I do not work ...>>> You could send me a simple working class that uses it please?>>> Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: [EMAIL PROTECTED]>> >> To: [email protected]> Subject: Re: Pdf in Lucene?>> Hi,> > In> >> my case I used PDFBox, just to extract the textfrom PDF document and> then> >> I created the Lucene documentgiving the extracted text. (I didn't use> the> >> PDFBox built inLucene search engine). So I didn't get any> incompatibility> >>problems.> > This blog post shows the way.>> >> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html>> >> > It worked perfect for me.> > Thanks.> >>_________________________________________________________________>>> Ci sai fare con l'italiano? Scoprilo con Typectionary!> >> http://typectionary.it.msn.com/> >>> >> >> >> > --> > Kalani Ruwanpathirana> > Department ofComputer Science & Engineering> > University of Moratuwa> >> > > >-- > Kalani Ruwanpathirana> Department of Computer Science &Engineering> University of Moratuwa
_________________________________________________________________
Fanne di tutti i colori, personalizza la tua Hotmail!
http://imagine-windowslive.com/Hotmail/#0


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Pdf in Lucene?

Reply via email to