Not entire document being indexed?

[EMAIL PROTECTED] Thu, 24 Feb 2005 11:08:28 -0800

Hi everyone

I'm having a bizzare problem with a few of the documents here that do not seem to get indexed entirely.

I use textmining WordExtractor to convert M$ Word to plain text and then index that text. For example one document which is about 230KB in size when converted to plain text, when indexed and later searched for a pharse in the last 2-3 paragraphs returns no hits, yet searching anything above those paragraphs works just fine. WordExtractor does convert the entire document to text, I've checked that.

I've tried increasing the number of terms per field from default 10,000 to 20,000 with writer.maxFieldLength but that didnt make any difference, still cant find phrases from the last 2-3 paragraphs.

Any ideas as to why this could be happening and how I could rectify it?


thanks,

-pedja

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Not entire document being indexed?

Reply via email to