According to Owen Boyle:
> I am indexing a site which contains a lot of complex hrefs like:
> 
> <a href="/index2.html?wh&top/whatsnew_en.html">
> 
> When I run with -vvv and grep "push" to see what it is indexing:
> 
>       with 3.1.5, I get:
> 
>       pushing http://author82/index.html
>       pushing http://author82/index2.html?wh&top/whatsnew_en.html
>       etc...
>       and the whole site is successfully indexed.
> 
>       with 3.1.6, I get:
> 
>       pushing http://author82/index.html
>       end of story...

Actually, grep only gives you part of the story.  There's a lot of relevant
information that you're missing out on.

> only the top (plain) documents are indexed and htdig does not push any
> of the complex URLs onto its stack.

OK, so the next logical step is to look deeper into the -vvv output to
see if htdig even sees these URLs at all, and if so, why these URLs are
now being rejected.  See http://www.htdig.org/FAQ.html#q5.27

> I tried setting "max_description_length: 256" but no effect. I suspect
> there is something in 3.1.6 which causes htdig not to recognise the
> complex URLs which contain "?" and "&" but I can't find any directive
> like "allow_in_url".
> 
> Note that I do not see any "rejected" comments in the trace.
> 
> Does anyone know how I can activate these URLs in 3.1.6?

They're not "deactivated" by default.  Either htdig is rejecting them
because of some of your attribute settings, or it's not seeing them at
all because of some side-effect of changes to the HTML parser in 3.1.6.
Do you have any settings of exclude_urls or url_rewrite_rules in your
config file?  Does the document containing these links also contain
inline JavaScript code that's not enclosed in HTML comments?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to