Hello, One of the users of my Checkbot tool <http://degraaff.org/checkbot/> had some trouble with the proxy functionality in LWP. In Checkbot I use the proxy and noproxy functionality of LWP without changes, so I guess that these issues should be addressed in LWP. The first issue is a problem with the noproxy feature in relation to domain-less hostnames. The report mentions FQDN's, but I think this should be read as canonical URI's. The second issue is a feature request.
Kind regards, Hans - checkbot will ask the proxy if the URL contains a non-FQDN hostname and --noproxy contains the local domain. E.g. one intranet server is foo.de.marconicomms.com, and I run checkbot --proxy bar --noproxy de.marconicomms.com which unwantedly asks the proxy bar for http://foo/index.html A direct connection is used for http://foo.de.marconicomms.com/index.html as expected. This is probably because the noproxy args are matched against the hostname as found in the URL, and not against the FQDN. Thus, "foo" does not match "de.marconicomms.com" and the proxy is used. This could be fixed if the matching would follow the same mechanism as the resolver, e.g. looking at the "search" line in /etc/resolv.conf for possible domains. Alternatively, a non-FQDN could be canonicalized by a name service lookup before being matched agains the noproxy list. What do you think? - The common web browsers (IE; Mozilla et al) configure their proxy/noproxy via a proxy.pac file. This file is normally centrally maintained. It is referenced in RFC 3040, quote: 6.2 Proxy Auto Configuration (PAC) Best known reference: "Navigator Proxy Auto-Config File Format" [12] Description: A JavaScript script retrieved from a web server is executed for each URL accessed to determine the appropriate proxy (if any) to be used to access the resource. User agents must be configured to request this script upon startup. There is no bootstrap mechanism, manual configuration is necessary. Despite manual configuration, the process of proxy configuration is simplified by centralizing it within a script at a single location. Security: Common policy per organization possible but still requires initial manual configuration. PAC is better than "manual proxy configuration" since PAC administrators may update the proxy configuration without further user intervention. Interoperability of PAC files is not high, since different browsers have slightly different interpretations of the same script, possibly leading to undesired effects. Deployment: Implemented in Netscape Navigator and Microsoft Internet Explorer. Submitter: Document editors. [12] refers to http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html Now, in an ideal world, checkbot would just use the same mechanism for proxy configuration that the web servers use. In fact, our proxy.pac is almost hundred lines in length and turning that into --[no]proxy args is repetitive and error prone. I realize that parsing a proxy.pac (= extremely restricted JavaScript) may not be a one-liner in perl. Anyway, maybe you consider the idea of using a standardized and flexible [no]proxy determination a Good Thing(TM). And if you don't get right to it, adding it to the shadow todo list is a good idea. :-)
signature.asc
Description: This is a digitally signed message part