You could write a "dummy" Analyzer that provides the tokens from your
external process. As for statistics, what kind are you interested in?
I suppose you can store them in a field along with the document, or you
can set the boost values for the field/document, but that may be a bit
simple for your needs.
Ralf Bierig wrote:
Hi,
in the context of a distributed information retrieval project, we
would like to use Lucene for its indexing capabilities but not for
retrieval. In particular, we would like to populate a Lucene index
with the tokens and statistics already computed by an external
indexer, thereby bypassing the document-based parsing, analysis, and
ingestion into the index which characterises Lucene's standard
workflow. Is this possible? That is, is it possible to feed
precomputed statistics into a Lucene's index? And is it possible to
have control on what statistics are associated with each document (as
we will not use Lucene for retrieval we are not interested in
complying with the statistics it needs to perform a search).
Any help greatly appreciated, many thanks.
Cheers,
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]