Hello, I am planning to use the inverted file db.worddump with db.docs computed by HtDig using the option -t. However, the file db.docs file contains incorrect lines. For example, it contains the following line:
231 u:http://www.mit.edu/people/asundqui/home.html t:IOA Programs a:0 m:1000759788 s:1218 H: PAPERS * A description of four algorithms written in IOA with accompanying graph data types dvi , ps > * A description of IOA code for a modified version of a spanning tree algorithm ... This line is incorrect since the excerpt written after the "H:" flag belongs to another page. (Actually, the page containing the text is http://www.mit.edu/people/cluhrs/, which was also digged by HtDig.) Has anyone faced similar problem? Is it possible that the problem is caused by excerpts of binary files which are not parsable? How should I configure HtDig to avoid such lines? Thanks for your help, Daniel _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

