On 4/18/07, William Mee <[EMAIL PROTECTED]> wrote:
I'd like to add metadata which I get *after* indexing a document's contents to 
the index. To be more specific: I'm implementing shingling (detection of 
near-duplicate documents) and want to add the document fingerprint (which is 
based on the sequence of tokens) to the index.

There doesn't seem to be an easy way to do this in the Lucene API - in 
particular, I can't easily update a document which is already indexed. The only 
way I could get this information *before* adding a document to an index is to 
create a token stream manually (and then have this happen all over again when 
the document is indexed). This isn't a satisfying solution. Does anyone have 
any suggestions of how I could get the fingerprint information into the index? 
I'd appreciate any input. Thanks!

You coudl write a custom tokenfilter which indexes the fingerprint as
the last token in the field as !fingerprint:123abfe2d3c23df23 or
similar.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to