It's generally a good idea to keep messages on-list. For one, I 
generally check my filtered mailboxes (including one for htdig-general) 
before my inbox. For another, there are plenty of people who can respond 
to your messages besides myself.
<http://www.htdig.org/FAQ.html#q1.16>

You're not correct about the HTML parser in 3.1.5. It most certainly 
*doesn't* look for "anything that resembles a URL." This would be 
largely impossible since many HTML pages use relative URLs! The previous 
parser in your case was "lucky" since it roughly figured "oh, here's 
another 'href' bit, but I'm already supposed to be in an href?" The 
3.1.6 parser takes each HTML tag that it finds and shakes out the 
attributes separately, e.g. the src="..." and alt="..." portions of an 
<img> tag.

As I said, it's very easy to add links to a page for ht://Dig. Add in 
<LINK> tags!

On Tuesday, April 9, 2002, at 02:43  AM, Owen Boyle wrote:

> Geoff Hutchison wrote:
>>
>> On Monday, April 8, 2002, at 10:54  AM, Owen Boyle wrote:
>>
>>>> *Tag: <a href=en_eexw.html
>>>> OnClick="parent.t.location.href='en_t_eexw.html'>, matched
>>
>> Don't use them. If I browse without JavaScript (which I do), use lynx,
>> curl, or other text-browser, use a browser without a JavaScript
>> implementation, etc., I will not be able to follow the links as you
>> suppose.
>
> Thanks Geoff, you confirmed my fears. I appreciate all you say about JS
> and I hate those "this-site-best-viewed-with.." messages too, but the
> sad fact is that I am not responsible for the content of the site and
> the customer (or rather, the customer's flashy start-up web-design
> contractor) has decided to use JS extensively and even to dictate
> browser type and version. They can do this because it is an
> information-resource site for a homogenous group of users and they can
> be pretty sure they are all in corporate offices with IE.
>
> The irritating thing is, that with 3.1.5, the JS-activated hrefs *did*
> get indexed. When I run 3.1.5 against the site, I get messages like:
>
> **A tag: pos = 2, position = =en_eexw.html
> OnClick="parent.t.location.href='en_t_eexw.html';">
> Terminating previous <a href=...> tag, which didn't have a closing </a>
> tag.
>    pushing http://author84/content/en_eexw.html
> +A tag: pos = 28, position = ='en_n_eexw.html';"
> Terminating previous <a href=...> tag, which didn't have a closing </a>
> tag.
>    pushing http://author84/content/en_t_eexw.html
>
> So it looks to me like the tag breaks up in the HTML parser. However,
> htdig then pores over the debris and picks out anything that looks like
> a URL and pushes it. This serendipity is lost in 3.1.6 because the
> parser is more robust and doesn't lose track while parsing the tag. So
> it indexes only the first, valid, href.
>
> It's going to be a bit tricky to explain all this to the customer...
>
> Rgds,
>
> Owen Boyle.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to