According to Owen Boyle: > I am indexing a site which contains a lot of complex hrefs like: > > <a href="/index2.html?wh&top/whatsnew_en.html"> > > When I run with -vvv and grep "push" to see what it is indexing: > > with 3.1.5, I get: > > pushing http://author82/index.html > pushing http://author82/index2.html?wh&top/whatsnew_en.html > etc... > and the whole site is successfully indexed. > > with 3.1.6, I get: > > pushing http://author82/index.html > end of story...
Actually, grep only gives you part of the story. There's a lot of relevant information that you're missing out on. > only the top (plain) documents are indexed and htdig does not push any > of the complex URLs onto its stack. OK, so the next logical step is to look deeper into the -vvv output to see if htdig even sees these URLs at all, and if so, why these URLs are now being rejected. See http://www.htdig.org/FAQ.html#q5.27 > I tried setting "max_description_length: 256" but no effect. I suspect > there is something in 3.1.6 which causes htdig not to recognise the > complex URLs which contain "?" and "&" but I can't find any directive > like "allow_in_url". > > Note that I do not see any "rejected" comments in the trace. > > Does anyone know how I can activate these URLs in 3.1.6? They're not "deactivated" by default. Either htdig is rejecting them because of some of your attribute settings, or it's not seeing them at all because of some side-effect of changes to the HTML parser in 3.1.6. Do you have any settings of exclude_urls or url_rewrite_rules in your config file? Does the document containing these links also contain inline JavaScript code that's not enclosed in HTML comments? -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

