Looked for this in the FAQ and archives and didn't see it anywhere. I'll
gladly send a Perforce t-shirt to the first person who points me in
the right direction here...
A particular set of pages on our site are being only partially indexed
by htdig. The first 22 words or so of these pages are indexed, and then
nothing.
This does NOT seem to be related to the config file values for
noindex_start & noindex_end. Those values are "<!--htdig-noindex>"
and "<!--/htdig-noindex-->", which don't appear anywhere in the
html source of the files that are affected here.
To see the problem, go to http://www.perforce.com/htdig/search.html ,
choose "Scope of Search: Perforce Manuals Only", and enter the following
values into the search field:
delete counters depot chapter
One page comes up, "p4 delete". Go to that page. Note that all those words
appear right at the top of the page, above the horizontal rule. Choose
any of the words below the horizontal rule (for example, "synopsis" or
"workspace"). Go back to the search page and add that word to the
search you've already done. The "p4 delete" page isn't found in the
search anymore.
Here is some additional info that may be helpful:
* I believe that the version of htdig we're using is 3.1.5, but I'm not
positive... this is a system I inherited and I don't know how to find
the htdig version number. (Any hints?)
* Note that the "p4 delete" page excerpt shown in the search results
is cut off at the end, exactly where the indexing fails.
* The pages that aren't being fully indexed all have the same structure
as the example above... they're all part of our command reference, and
are created by generating HTML from Framemaker files via Webworks
Publisher. The resulting HTML has structural problems... but I made
sure that the above example ("p4 delete") has valid HTML through line
105, and I reran htdig once that change was made, and the problem still
occurs, and occurs well before the html gets screwy. (Does htDig even
care whether the html is well formed, or does it just treat content
as anything outside of a <...> )?
Any help would be appreciated. Thank you!
Robert Orenstein
Perforce Software
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html