Greetings, Having solved my problem with the noindex meta-teg, I've now run into another problem with Javascript-activated hrefs. When indexing with 3.1.6, htdig cannot parse them. Running with -vvvv, I get:
> *Tag: <a href=en_eexw.html OnClick="parent.t.location.href='en_t_eexw.html'>, >matched 2 > Tag: </a>, matched 3 > href: http://author84/content/en_eexw.html () > resolving 'http://author84/content/en_eexw.html' > pushing http://author84/content/en_eexw.html ..and onto the next tag. The href to 'en_t_eexw.html' never gets parsed. Checking the FAQ and searching the archives, I find that this is to be expected (e.g. http://www.htdig.org/FAQ.html#q5.18) However, I noticed that it did "seem" to work before with 3.1.5. When I check the trace for a 3.1.5 run, I find that I get a lot of: > Terminating previous <a href=...> tag, > which didn't have a closing </a> tag. messages. This is despite the fact that the HTML is actually OK (well, apart from all that JS :-). My interpretation (i.e. what I am going to tell the customer) is that the thing used to work because the complex tag broke up in the HTML parser and htdig saved the day by picking out the hrefs by brute-force. Now that the new version has a more standards-robust HTML parser, it reads the full tag (it doesn't break up) but now htdig cannot process the JS-activated hrefs. Have I got this right? Does anyone know how I can follow JS hrefs? Rgds, Owen Boyle. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

