Re: Does HTTP allow this?

2003-11-10 Thread Daniel Stenberg
On Sun, 9 Nov 2003, Hrvoje Niksic wrote:

 One thing that might break (but that Wget doesn't yet support anyway) is
 NTLM, which seems to authorize the *connections* individual connections.

Yes it does. It certainly makes things more complicated, as you would have to
exclude such a connection from the checks (at least I think you want that, I
don't think you'll be forced to do so). And you also need to exclude
HTTPS-connections from this logic (since name-based virtual hosting over SSL
isn't really possible).

curl doesn't do such advanced IP-checking to detect existing connections to
re-use, it only uses host-name based checking for connection re-use for
persistant connections.

 Does curl handle NTLM?

Yes it does since a while back. I am willing to donate NTLM code to the wget
project, if you want it. I'm not very familiar with the wget internals so they
wouldn't be a fully working patch, but a set of (proved working) functions to
be integrated by someone with more wget insights. (It depends on crypto-
functions provided by OpenSSL.)

Otherwise, I can recommend Eric Glass' superb web page for all bits and and
details on the NTLM protocol:

http://davenport.sourceforge.net/ntlm.html

--
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: Does HTTP allow this?

2003-11-10 Thread Hrvoje Niksic
Daniel Stenberg [EMAIL PROTECTED] writes:

 Yes it does. It certainly makes things more complicated, as you
 would have to exclude such a connection from the checks (at least I
 think you want that, I don't think you'll be forced to do so). And
 you also need to exclude HTTPS-connections from this logic (since
 name-based virtual hosting over SSL isn't really possible).

I'm already treating SSL and non-SSL connections as incompatible.  But
I'm curious as to why you say name-based virtual hosting isn't
possible over SSL?

 Does curl handle NTLM?

 Yes it does since a while back. I am willing to donate NTLM code to
 the wget project, if you want it. I'm not very familiar with the
 wget internals so they wouldn't be a fully working patch, but a set
 of (proved working) functions to be integrated by someone with more
 wget insights. (It depends on crypto- functions provided by
 OpenSSL.)

That's very generous, thanks!  I planned to improve Wget's HTTP
internals (which are very raw right now) anyway, so don't worry about
that.  A set of callable functions would be perfect.


Re: Does HTTP allow this?

2003-11-10 Thread Daniel Stenberg
On Mon, 10 Nov 2003, Hrvoje Niksic wrote:

 I'm already treating SSL and non-SSL connections as incompatible.  But I'm
 curious as to why you say name-based virtual hosting isn't possible over
 SSL?

To quote the Apache docs: Name-based virtual hosting cannot be used with SSL
secure servers because of the nature of the SSL protocol.

Since you connect to the site in a secure manner, you can't select which host
to get data from after a successful connection, as the connection will not be
successul unless you have all the proper credentials already.

 That's very generous, thanks!

I'll prepare a C file and header and post them in a separate mail. They will
need a little attention, but not much. Mainly to setup pointers to user name,
password, etc.

-- 
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: Does HTTP allow this?

2003-11-10 Thread Tony Lewis
Hrvoje Niksic wrote:

 Assume that Wget has retrieved a document from the host A, which
 hasn't closed the connection in accordance with Wget's keep-alive
 request.

 Then Wget needs to connect to host B, which is really the same as A
 because the provider uses DNS-based virtual hosts.  Is it OK to reuse
 the connection to A to talk to B?
snip
 FWIW, it works fine with Apache.

There is a fairly high probability that it will work with most hosts
(regardless of the server software). If an IP address has been registered
with multiple hosts, then the address alone is not sufficient to retrieve a
resource so you have to add a Host header.

It's possible that the server responding to the IP address forwards
connections to multiple backend servers. These backend servers may or may
not know about all the resources that the gateway server know about.

Since it will work most of the time, I think it's a reasonable optimization
to use, however you might want to add a --one-host-per-connection flag for
the rare cases where the current behavior won't work.

Tony



Re: Does HTTP allow this?

2003-11-10 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 It's possible that the server responding to the IP address forwards
 connections to multiple backend servers. These backend servers may
 or may not know about all the resources that the gateway server know
 about.

That is precisely the case I'm worried about.  I can't point to
anything in rfc2616 that would forbid this kind of server-side
optimization.

 Since it will work most of the time, I think it's a reasonable
 optimization to use, however you might want to add a
 --one-host-per-connection flag for the rare cases where the current
 behavior won't work.

The thing is, I don't want to bloat Wget with obscure options to turn
off even more obscure (and *very* rarely needed) optimizations.  Wget
has enough command-line options as it is.  If there are cases where
the optimization doesn't work, I'd rather omit it completely.



Re: Does HTTP allow this?

2003-11-10 Thread Tony Lewis
Hrvoje Niksic wrote:

 The thing is, I don't want to bloat Wget with obscure options to turn
 off even more obscure (and *very* rarely needed) optimizations.  Wget
 has enough command-line options as it is.  If there are cases where
 the optimization doesn't work, I'd rather omit it completely.

It's probably safest to turn off that optimization even if it does eliminate
a few opens now and then.

Tony



Re: Does HTTP allow this?

2003-11-10 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 Hrvoje Niksic wrote:

 The thing is, I don't want to bloat Wget with obscure options to turn
 off even more obscure (and *very* rarely needed) optimizations.  Wget
 has enough command-line options as it is.  If there are cases where
 the optimization doesn't work, I'd rather omit it completely.

 It's probably safest to turn off that optimization even if it does
 eliminate a few opens now and then.

Yup.  If we get a report of a case where it doesn't work, it goes
away.

(NB the optimization is already there since at least 1.8.x, and noone
has reported a problem.  For example, try `wget www.apache.org
httpd.apache.org' with 1.8.2.)



Re: Does HTTP allow this?

2003-11-09 Thread Daniel Stenberg
On Sat, 8 Nov 2003, Hrvoje Niksic wrote:

 So if I have the connection to the endpoint, I should be able to reuse it.
 But on the other hand, a server might decide to connect a file descriptor to
 a handler for a specific virtual host, which would be unable to serve
 anything else.  FWIW, it works fine with Apache.

I would say that your described approach would work nicely, and it would not
contradict anything in the HTTP standards. Each single request is stand-alone
and may indeed have its own Host: header, even when the connection is kept
alive.

At least this is how I interpret these things.

-- 
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: Does HTTP allow this?

2003-11-09 Thread Hrvoje Niksic
Daniel Stenberg [EMAIL PROTECTED] writes:

 On Sat, 8 Nov 2003, Hrvoje Niksic wrote:

 So if I have the connection to the endpoint, I should be able to
 reuse it.  But on the other hand, a server might decide to connect
 a file descriptor to a handler for a specific virtual host, which
 would be unable to serve anything else.  FWIW, it works fine with
 Apache.

 I would say that your described approach would work nicely, and it
 would not contradict anything in the HTTP standards. Each single
 request is stand-alone and may indeed have its own Host: header,
 even when the connection is kept alive.

Hmm, OK.  I guess I needed an independent confirmation, thanks.  I
would feel safer if 19.6.1.1 section of rfc2616 were explicit about
persistent connections, but I guess it could be inferred.

One thing that might break (but that Wget doesn't yet support anyway)
is NTLM, which seems to authorize the *connections* individual
connections.  Does curl handle NTLM?