On Mon, May 02, 2005 at 01:51:16PM -0700, Dan Quinlan wrote:
> message.  For instance, the same URL shows up multiple times with the
> current API.  I'd like to be able to do things like:

How so?  Also, keep in mind there's a difference between the text parse and
html parsed uris.  If we wanted to merge those together, it'd be pretty easy.

>   - how many URLs were there originally and what were they?

keys %array in scalar and array context.

>   - what are the list of sites users could likely go to?
>     (so, canonicalized anchor destinations (ignore the stuff
>     between <a> and </a> in those cases) plus non-hyperlink "cut and
>     paste or expect MUA to hyperlinkize" ones in text where there wasn't
>     a real anchor)

I guess it depends what you mean by "sites".  If hostname or domain, that's
pretty trivial, see the URIBL code.

>   - which URLs don't match their text?

Easily checked.  See EvalTests::check_https_ip_mismatch().

>   - which URLs were over-encoded and where do *they* ultimately go?

You could compare the key and the list of canonicals, but it depends what your
algorithm would be.

-- 
Randomly Generated Tagline:
 "Hurry up! I wanna see the moon." -Fry 
  "Relax. It's open 'till nine." -Leela 

Attachment: pgpv64jwnqt2r.pgp
Description: PGP signature

Reply via email to