I certainly don't either, since you haven't said what the actual exception is. If I had to guess, though, I would say it is the line
 Document document = LucenePDFDocument.getDocument

And that the Lucene library expected by PDFBox is not the same version of Lucene you are using. I would suggest not relying on PDFBox to create your document, and instead look at the PDFBox calls that you need to make to then create your Document.


On Dec 1, 2008, at 9:18 AM, tiziano bernardi wrote:



this is my class, I use eclipse and I haven't any errors.Do not understand where the problem ....


import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.pdfbox.searchengine.lucene.LucenePDFDocument;

public final class SimplePdfSearch
{
private static final String PDF_FILE_PATH = "C:\\Users\\Tiziano\ \Desktop\\doc_di_prova\\prova.pdf";
private static final String SEARCH_TERM = "prova";

public static final void main(String[] args) throws IOException
{
Directory directory = null;

try
{
File pdfFile = new File(PDF_FILE_PATH);
Document document = LucenePDFDocument.getDocument(pdfFile);

directory = new RAMDirectory();

IndexWriter indexWriter = null;

try
{
Analyzer analyzer = new StandardAnalyzer();
indexWriter = new IndexWriter(directory, analyzer, true);

indexWriter.addDocument(document);
}
finally
{
if (indexWriter != null)
{
try
{
indexWriter.close();
}
catch (IOException ignore)
{
// Ignore
}

indexWriter = null;
}
}

IndexSearcher indexSearcher = null;

try
{
indexSearcher = new IndexSearcher(directory);

Term term = new Term("contents", SEARCH_TERM);
Query query = new TermQuery(term);

Hits hits = indexSearcher.search(query);

System.out.println((hits.length() != 0) ? "Found" : "Not Found");
}
finally
{
if (indexSearcher != null)
{
try
{
indexSearcher.close();
}
catch (IOException ignore)
{
// Ignore
}

indexSearcher = null;
}
}
}
finally
{
if (directory != null)
{
try
{
directory.close();
}
catch (IOException ignore)
{
// Ignore
}

directory = null;
}
}
}
}> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?> Date: Mon, 1 Dec 2008 08:22:58 -0500> > > On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote:> > >> > I tried to use pdfbox but gives me an error.> > That the version of lucene and the pdfbox are incompatible.> > Lucene knows nothing about PDFBox, so I don't see how they could be > incompatible, unless your are referring to PDFBox's Lucene Document > creator, in which case, you should ask on the PDFBox mailing list. I > think, however, that it's pretty straightforward to create a Lucene > document from PDFBox, so you shouldn't need to rely on their version.> > Personally, I'd have a look at Tika (http://lucene.apache.org/tika ), > which wraps PDFBox (and other extraction libraries) and gives you back > SAX-like events via a ContentHandler, which you can then use to create > Lucene documents. Else, I've been working on SOLR-284, which > integrates Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284 > > -Grant> > >> > I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008 11:43:00 > > +0000> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org > > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes text so > > you'll have to get the text out of the PDF> and feed it to lucene.> > > > Google for lucene pdf, or go straight to http://www.pdfbox.org/ > > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi <[EMAIL PROTECTED] >:> > > >> >> > Hi,> > I want to index PDF files with lucene is possible?> > > > What like?> > Thanks Tiziano Bernardi> > > > _________________________________________________________________> > > > Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0 > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user- [EMAIL PROTECTED]> > > For additional commands, e-mail: [EMAIL PROTECTED] >> > _________________________________________________________________> > 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearch.games.msn.com/crosswire/play_it/ > > --------------------------> Grant Ingersoll> > Lucene Helpful Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance> http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED]> For additional commands, e-mail: [EMAIL PROTECTED]>
_________________________________________________________________
Vai oltre le parole, scarica il nuovo Messenger!
http://download.live.com/?mkt=it-it

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to