I certainly don't either, since you haven't said what the actual
exception is. If I had to guess, though, I would say it is the line
Document document = LucenePDFDocument.getDocument
And that the Lucene library expected by PDFBox is not the same version
of Lucene you are using. I would suggest not relying on PDFBox to
create your document, and instead look at the PDFBox calls that you
need to make to then create your Document.
On Dec 1, 2008, at 9:18 AM, tiziano bernardi wrote:
this is my class, I use eclipse and I haven't any errors.Do not
understand where the problem ....
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.pdfbox.searchengine.lucene.LucenePDFDocument;
public final class SimplePdfSearch
{
private static final String PDF_FILE_PATH = "C:\\Users\\Tiziano\
\Desktop\\doc_di_prova\\prova.pdf";
private static final String SEARCH_TERM = "prova";
public static final void main(String[] args) throws IOException
{
Directory directory = null;
try
{
File pdfFile = new File(PDF_FILE_PATH);
Document document = LucenePDFDocument.getDocument(pdfFile);
directory = new RAMDirectory();
IndexWriter indexWriter = null;
try
{
Analyzer analyzer = new StandardAnalyzer();
indexWriter = new IndexWriter(directory, analyzer, true);
indexWriter.addDocument(document);
}
finally
{
if (indexWriter != null)
{
try
{
indexWriter.close();
}
catch (IOException ignore)
{
// Ignore
}
indexWriter = null;
}
}
IndexSearcher indexSearcher = null;
try
{
indexSearcher = new IndexSearcher(directory);
Term term = new Term("contents", SEARCH_TERM);
Query query = new TermQuery(term);
Hits hits = indexSearcher.search(query);
System.out.println((hits.length() != 0) ? "Found" : "Not Found");
}
finally
{
if (indexSearcher != null)
{
try
{
indexSearcher.close();
}
catch (IOException ignore)
{
// Ignore
}
indexSearcher = null;
}
}
}
finally
{
if (directory != null)
{
try
{
directory.close();
}
catch (IOException ignore)
{
// Ignore
}
directory = null;
}
}
}
}> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org>
Subject: Re: Pdf in Lucene?> Date: Mon, 1 Dec 2008 08:22:58 -0500> >
> On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote:> > >> > I
tried to use pdfbox but gives me an error.> > That the version of
lucene and the pdfbox are incompatible.> > Lucene knows nothing
about PDFBox, so I don't see how they could be > incompatible,
unless your are referring to PDFBox's Lucene Document > creator, in
which case, you should ask on the PDFBox mailing list. I > think,
however, that it's pretty straightforward to create a Lucene >
document from PDFBox, so you shouldn't need to rely on their
version.> > Personally, I'd have a look at Tika (http://lucene.apache.org/tika
), > which wraps PDFBox (and other extraction libraries) and gives
you back > SAX-like events via a ContentHandler, which you can then
use to create > Lucene documents. Else, I've been working on
SOLR-284, which > integrates Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284
> > -Grant> > >> > I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon,
1 Dec 2008 11:43:00 > > +0000> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org
> > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes
text so > > you'll have to get the text out of the PDF> and feed it
to lucene.> > > > Google for lucene pdf, or go straight to http://www.pdfbox.org/
> > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi <[EMAIL PROTECTED]
>:> > > >> >> > Hi,> > I want to index PDF files with lucene is
possible?> > > > What like?> > Thanks Tiziano Bernardi> > > >
_________________________________________________________________> >
> > Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0
> > > > > >
--------------------------------------------------------------------- >
> > To unsubscribe, e-mail: java-user-
[EMAIL PROTECTED]> > > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
_________________________________________________________________> >
50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > http://livesearch.games.msn.com/crosswire/play_it/
> > --------------------------> Grant Ingersoll> > Lucene Helpful
Hints:> http://wiki.apache.org/lucene-java/BasicsOfPerformance> http://wiki.apache.org/lucene-java/LuceneFAQ
> > > > > > > > > > > >
---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]>
For additional commands, e-mail: [EMAIL PROTECTED]>
_________________________________________________________________
Vai oltre le parole, scarica il nuovo Messenger!
http://download.live.com/?mkt=it-it
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]