On 4/18/07, William Mee <[EMAIL PROTECTED]> wrote:
I'd like to add metadata which I get *after* indexing a document's contents to
the index. To be more specific: I'm implementing shingling (detection of
near-duplicate documents) and want to add the document fingerprint (which is
based on the sequence of tokens) to the index.
There doesn't seem to be an easy way to do this in the Lucene API - in
particular, I can't easily update a document which is already indexed. The only
way I could get this information *before* adding a document to an index is to
create a token stream manually (and then have this happen all over again when
the document is indexed). This isn't a satisfying solution. Does anyone have
any suggestions of how I could get the fingerprint information into the index?
I'd appreciate any input. Thanks!
You coudl write a custom tokenfilter which indexes the fingerprint as
the last token in the field as !fingerprint:123abfe2d3c23df23 or
similar.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]