RE: Design issue
Herold Heiko [EMAIL PROTECTED] writes: Yes, the windows and dos version (OS/2 too ?) can't use :, so if we need to choose a separator we could as well choose something which does create as few as possible problems on most platforms. Even if this does mean a slightly different syntax than the classic protocol:port URI form... since those directories are a sort of reminder where that data came from there's no _strict_ need to mirror the original URI imho. Yes, and there's no _strict_ need to pick a lowest-common-denominator notation just to make Windows happy either. Like I said, the Windows version of Wget already has to change ':'s into something else. I don't see this ':' as any different. This made me really think about this issue... what about people who do access the same filesystem from multiple operating systems ? Maybe wget should have a (default disabled!) option to use only common denominator characters, which are available on every filesystem possible... unix really doesn't have any problems except /, everything else is just cosmetical; windows... we all know that :( ... what about Os/2 hpfs (same as windows ?) What about BeOs, MacOs, all those where wget could theoretically be ported one day or where a filesystem mirrored by wget could be accessed remotely ? It's no good thinking "well the file system sharing protocol can correct those filenames", because the links wouldn't work any more. Implementing this should be more easy than gathering a (complete) list of problematic characters. Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1 ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087 -- ITALY
Re: Design issue
"Dan Harkless" [EMAIL PROTECTED] writes: I think the most straightforward mapping would also be the most attractive: ftp/site/dir/file http/site/dir/file Don't forget the port, if you aim for completeness. Wget should certainly have an option to make it behave this way. In fact, I'd prefer it to behave that way by default, for the reasons you mention, and introduce an option to leave off the protocol. That would suck for people who have come to expect the current behaviour. I had different thoughts: I would like have liked to not include the host name by default, since hosts are not traversed by default anyway. But doing that would violate the previous paragraph, so I didn't.
Re: Design issue
Hrvoje Niksic [EMAIL PROTECTED] writes: "Dan Harkless" [EMAIL PROTECTED] writes: I think the most straightforward mapping would also be the most attractive: ftp/site/dir/file http/site/dir/file Don't forget the port, if you aim for completeness. Yeah, you've probably seen the subsequent messages on this by now where we talk about this. I do think port 80 should be implied, though, just as it almost always is in URLs. Wget should certainly have an option to make it behave this way. In fact, I'd prefer it to behave that way by default, for the reasons you mention, and introduce an option to leave off the protocol. That would suck for people who have come to expect the current behaviour. Well, it wouldn't be very tough for them to adjust their archives by moving the hostname directories into "ftp" or "http" directories. Or else use the option that makes it leave off the protocol. As long as we document the change clearly, it doesn't seem that bad to me. To me being destructive to files when FTP and WWW servers are run on the same machine is a worse evil than temporarily confusing some people who don't read the documentation when upgrading to Wget 1.7 from an older version. I had different thoughts: I would like have liked to not include the host name by default, since hosts are not traversed by default anyway. And only create the hostname directories when -H is specified? Yuck. --- Dan Harkless| To help prevent SPAM contamination, GNU Wget co-maintainer | please do not mention this email http://sunsite.dk/wget/ | address in Usenet posts -- thank you.
Re: Design issue
Hrvoje Niksic [EMAIL PROTECTED] writes: "Dan Harkless" [EMAIL PROTECTED] writes: Well, it wouldn't be very tough for them to adjust their archives by moving the hostname directories into "ftp" or "http" directories. Or else use the option that makes it leave off the protocol. As long as we document the change clearly, it doesn't seem that bad to me. To me being destructive to files when FTP and WWW servers are run on the same machine is a worse evil than temporarily confusing some people who don't read the documentation when upgrading to Wget 1.7 from an older version. It's not that someone wouldn't be aware of the change, but that they would view the change as gratuitous. Hard to say. If they considered the file-overwriting-when-ftp-and-www-server-are-on-same-host problem to be serious, as I and others on the list do, they might not consider it to be gratuitous. And again, it won't be tough to get Wget to revert to the old behavior, if they prefer it. Perhaps we should take a vote on whether the new with-protocol local filenames should become the default or not. I know I would. Wget has been behaving like this since day 1, and we should have a very compelling reason for changing the default. Hmm. Well, haven't we changed a lot of other things that have been the case since day 1? I think the failure to properly mirror a server that runs both ftp and http is "very compelling", but perhaps most people would disagree. It's true that this isn't a problem all *that* often, since professional sites almost always use ftp://ftp.domainname and http://www.domainname, even if "ftp" and "www" are just aliases for the same machine. Also, FTP stuff is *usually* under a "pub/" directory, and WWW content *usually* doesn't have a directory with that name. I'm not dead-set against retaining the current default behavior, but I'd still personally prefer to have it that way. If we don't end up turning on the protocol directories by default, what about the non-80 ports in the hostname directories (e.g. site.com/... vs. site.com:8080/...)? It seems very wrong to me to put those both in the same directory by default, and we don't have the "ftp.domainname" and "pub/" saves that we do in the FTP vs. HTTP case. I had different thoughts: I would like have liked to not include the host name by default, since hosts are not traversed by default anyway. And only create the hostname directories when -H is specified? Yuck. ``Yuck'' is in the eye of the beholder. Yeah, but I'm sure most people consider the site name to be an important piece of info on the local copy. --- Dan Harkless| To help prevent SPAM contamination, GNU Wget co-maintainer | please do not mention this email http://sunsite.dk/wget/ | address in Usenet posts -- thank you.
Re: Design issue
Herold Heiko wrote: I think the most straightforward mapping would also be the most attractive: ftp/site/dir/file http/site/dir/file Wget should certainly have an option to make it behave this way. In fact, I'd prefer it to behave that way by default, for the reasons you mention, and introduce an option to leave off the protocol. I agree. What about https ? What about answering on more than one port like java.sun.com used to do where :80 had a java menu and :81 not. This is a bad example as it was mostly the same web-site The files could be either in a separate https directory (logically more correct) or reside in the http directory in order to minimize ../../../../dir/dir/dir/something url rewriting (since I suppose those pages could share lots of inline pics and other links with the http structure). Speaking of https, I got exactly one report (in private mail) of successfully testing of the windows ssl enabled binary, nothing else. Could you commit the patch as http://www.mail-archive.com/wget@sunsite.dk/msg00142.html ? The changes in gen_sslfunc.c could be needed anyway for other operating systems (the are mirrored from similar code in sysdep.h and http.c, although I just noticed a inconditional include of time.h in ftpparse.c), while the changes in the VC makefile are as default commented out. Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1 ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087 -- ITALY Hack 8-)
Re: Design issue
Jan Prikryl [EMAIL PROTECTED] writes: Quoting Dan Harkless ([EMAIL PROTECTED]): I don't see why we would use an '_' instead of a ':' on the second version (except on Windows if the ':' character is a no-no there). The colon is a relict of DOS path notation (C:\) so it cannot appear in a filename. Fine, but I'm not really up for obfuscating the URLs on UNIX just to make DOS/Windows happy. Already the Windows port has to deal with more characters being non-allowed in filenames than on UNIX. This is just another one. --- Dan Harkless| To help prevent SPAM contamination, GNU Wget co-maintainer | please do not mention this email http://sunsite.dk/wget/ | address in Usenet posts -- thank you.
Re: Design issue
On Fri, 9 Feb 2001, Dan Harkless wrote: Jan Prikryl [EMAIL PROTECTED] writes: The colon is a relict of DOS path notation (C:\) so it cannot appear in a filename. Fine, but I'm not really up for obfuscating the URLs on UNIX just to make DOS/Windows happy. Already the Windows port has to deal with more characters being non-allowed in filenames than on UNIX. This is just another one. Clearly, the non-unix ports can be modified to deal with incompatible filenames. I think this is more a question of whether you intentionally want to create portability problems when creating new code. There is no question that portability comes at a price. In this case it is loss of the exact URL in the new filepath. From a systems viewpoint, it seems much simpler to avoid problem code rather than assume that someone can create a workaround for those systems where it doesn't work. This just adds to the complexity of maintaining the code. The created path is an indication of the origin of the files, not necessarily a full URL. As noted in other posts, the protocol name has not previously been included in the path, so this has never been the full URL. Perhaps I am too used to DOS and Windows ports of unix programs, but the mentioned alternatives to the colon in the filename all looked reasonable and intuitive to me. I didn't perceive any obfuscation. Doug __ Doug Kaufman Internet: [EMAIL PROTECTED]