According to Robert Orenstein:
> Hi, Geoff. Answers and queries follow.
>
> > htdig -? (with any program)
> > gives the help message and a version number.
>
> I'm not seeing the expected result:
>
> % htdig -?
> htdig: No match.
> % htmerge -?
> htmerge: No match.
> % htfuzzy -?
> htfuzzy: No match.
>
> Am I missing something here?
Yes, either "set nonomatch" in your shell, or quote/escape the question
mark so the shell doesn't try to use it as a wildcard, e.g. "htdig -\?"
> > I guess the next question I'd ask is what your max_head_length and
> > max_document_length variables are--excerpts are trimmed by the former and
> > all documents are trimmed by the latter before indexing.
>
> max_head_length: 10000
> max_doc_size is unset, so it's the default (100,000).
>
> The example document ("p4 delete") is exactly 9993 characters, including
> the html, so the above values don't seem to be influencing this.
>
> > I'd also ask how your database is updated--has it been updated
> > recently? Have you tried rebuilding it from scratch?
>
> I ran htdig -i last night right before I wrote this letter.
>
> Geoff, how would the comment tags affect the results? And does htdig
> follow the simple algorithm of ignoring anything between < and >, and
> ignoring anything between noindex_start and noindex_end, and counting
> everything else as content?
You're comments don't follow the HTML standard. There should only be
two dashes at the start and end of each comment, no more and no less.
htdig versions 3.1.0 and 3.1.1 had big problems with non-standard comments
like this, and ended up not being able to find the end of the comment in
some cases (especially when there was an odd number of dashes). Later
versions fixed this, so I think 3.1.5 should correctly handle this
document.
You could try indexing with -vvvv and seeing exactly what the parser is
doing right up to the point it stops scanning the document. (This will
generate lots of output, unless you set your start_url to point to only
this document, for the sake of testing this problem.)
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html