Hello,

I am planning to use the inverted file db.worddump with db.docs computed
by HtDig using the option -t. However, the
file db.docs file contains incorrect lines. For example, it contains the
following line:

231     u:http://www.mit.edu/people/asundqui/home.html  t:IOA Programs
a:0     m:1000759788    s:1218  H: PAPERS * A description of four
algorithms written in IOA with accompanying graph data types dvi , ps > *
A description of IOA code for a modified version of a spanning tree
algorithm ...

This line is incorrect since the excerpt written after the "H:" flag
belongs to another page. (Actually, the page containing the text is
http://www.mit.edu/people/cluhrs/, which was also digged by HtDig.)

Has anyone faced similar problem? Is it possible that the problem is
caused by excerpts of binary files which are not parsable? How should I
configure HtDig to avoid such lines?

Thanks for your help,
Daniel




_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to