Greetings,

Having solved my problem with the noindex meta-teg, I've now run into
another problem with Javascript-activated hrefs. When indexing with
3.1.6, htdig cannot parse them. Running with -vvvv, I get:

> *Tag: <a href=en_eexw.html OnClick="parent.t.location.href='en_t_eexw.html'>, 
>matched 2
> Tag: </a>, matched 3
> href: http://author84/content/en_eexw.html ()
> resolving 'http://author84/content/en_eexw.html'
>   pushing http://author84/content/en_eexw.html

..and onto the next tag. The href to 'en_t_eexw.html' never gets parsed.

Checking the FAQ and searching the archives, I find that this is to be
expected (e.g. http://www.htdig.org/FAQ.html#q5.18) However, I noticed
that it did "seem" to work before with 3.1.5. When I check the trace for
a 3.1.5 run, I find that I get a lot of:

> Terminating previous <a href=...> tag, 
> which didn't have a closing </a> tag.

messages. This is despite the fact that the HTML is actually OK (well,
apart from all that JS :-). 

My interpretation (i.e. what I am going to tell the customer) is that
the thing used to work because the complex tag broke up in the HTML
parser and htdig saved the day by picking out the hrefs by brute-force.
Now that the new version has a more standards-robust HTML parser, it
reads the full tag (it doesn't break up) but now htdig cannot process
the JS-activated hrefs.

Have I got this right? Does anyone know how I can follow JS hrefs?

Rgds,

Owen Boyle.

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to