Re: DocumentsWriter.checkMaxTermLength issues

Gabi Steinberg Thu, 20 Dec 2007 08:33:42 -0800

It might be a bit harsh to drop the document if it has a very long tokenin it. I can imagine documents with embedded binary data, where thetext around the binary data is still useful for search.

My feeling is that long tokens (longer than 128 or 256 bytes) are notuseful for search, and should be truncated or dropped.


Gabi.

Yonik Seeley wrote:

On Dec 20, 2007 11:15 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:

Though ... we could simply immediately delete the document when any
exception occurs during its processing.  So if we think whenever any
doc hits an exception, then it should be deleted, it's not so hard to
implement that policy...


It does seem like you only want documents in the index that didn't
generate exceptions... otherwise it doesn't seem like you would know
exactly what got indexed.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DocumentsWriter.checkMaxTermLength issues

Reply via email to