Please note that "Lucene" is a java library for building applications.
the examples you refer to below are two applications built with the Lucene
library -- those applications are actually just demonstrations of hte
types of things that are possible using the Lucene library (and the PDFBox
library)
if you want to do more complicated things you either need to write you own
application (you can base it off the sample code you are currently
running) or you need to look into existing applications.
in the first case, please consult the [EMAIL PROTECTED] mailing list if you
need assistence
in the second case, it may help to review this list of applications...
http://wiki.apache.org/lucene-java/PoweredBy
...based on the situation you describe however, i would think that Nutch
may be the best place for you to start...
http://lucene.apache.org/nutch/
: I am working with lucene and i am new
:
: I want to index documents HTML for this I do
:
: java org.w3c.tidy.Tidy - m * html
:
: java org.apache.lucene.demo.IndexHTML - create - index index .\
:
: all this generates index to me and when doing my search in the Web if it
: shows to the documents and the summary to me.
:
: despues I index pdf
:
: org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \
:
: this also generates index to me
:
: but the index PDF replace index HTML
:
: how I can make him to have single index and when doing my search in the WEB
: showme as HTML and PDF documents?
:
: thanks
-Hoss