RE: Design issue

2001-02-12 Thread Herold Heiko

Herold Heiko [EMAIL PROTECTED] writes:
 Yes, the windows and dos version (OS/2 too ?) can't use :, 
so if we need
 to choose a separator we could as well choose something which does
 create as few as possible problems on most platforms.
 Even if this does mean a slightly different syntax than the classic
 protocol:port URI form... since those directories are a sort 
of reminder
 where that data came from there's no _strict_ need to mirror the
 original URI imho.

Yes, and there's no _strict_ need to pick a lowest-common-denominator
notation just to make Windows happy either.  Like I said, the Windows
version of Wget already has to change ':'s into something 
else.  I don't see
this ':' as any different.


This made me really think about this issue... what about people who do
access the same filesystem from multiple operating systems ?
Maybe wget should have a (default disabled!) option to use only common
denominator characters, which are available on every filesystem
possible... unix really doesn't have any problems except /, everything
else is just cosmetical; windows... we all know that :( ... what about
Os/2 hpfs (same as windows ?) What about BeOs, MacOs, all those where
wget could theoretically be ported one day or where a filesystem
mirrored by wget could be accessed remotely ? It's no good thinking
"well the file system sharing protocol can correct those filenames",
because the links wouldn't work any more.
Implementing this should be more easy than gathering a (complete) list
of problematic characters.

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1 ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: Design issue

2001-02-10 Thread Hrvoje Niksic

"Dan Harkless" [EMAIL PROTECTED] writes:

 I think the most straightforward mapping would also be the most attractive:
 
 ftp/site/dir/file
 http/site/dir/file

Don't forget the port, if you aim for completeness.

 Wget should certainly have an option to make it behave this way.  In
 fact, I'd prefer it to behave that way by default, for the reasons
 you mention, and introduce an option to leave off the protocol.

That would suck for people who have come to expect the current
behaviour.

I had different thoughts: I would like have liked to not include the
host name by default, since hosts are not traversed by default anyway.
But doing that would violate the previous paragraph, so I didn't.



Re: Design issue

2001-02-10 Thread Dan Harkless


Hrvoje Niksic [EMAIL PROTECTED] writes:
 "Dan Harkless" [EMAIL PROTECTED] writes:
  I think the most straightforward mapping would also be the most attractive:
  
  ftp/site/dir/file
  http/site/dir/file
 
 Don't forget the port, if you aim for completeness.

Yeah, you've probably seen the subsequent messages on this by now where we
talk about this.  I do think port 80 should be implied, though, just as it
almost always is in URLs.

  Wget should certainly have an option to make it behave this way.  In
  fact, I'd prefer it to behave that way by default, for the reasons
  you mention, and introduce an option to leave off the protocol.
 
 That would suck for people who have come to expect the current
 behaviour.

Well, it wouldn't be very tough for them to adjust their archives by moving
the hostname directories into "ftp" or "http" directories.  Or else use the
option that makes it leave off the protocol.

As long as we document the change clearly, it doesn't seem that bad to me.
To me being destructive to files when FTP and WWW servers are run on the
same machine is a worse evil than temporarily confusing some people who
don't read the documentation when upgrading to Wget 1.7 from an older
version.

 I had different thoughts: I would like have liked to not include the
 host name by default, since hosts are not traversed by default anyway.

And only create the hostname directories when -H is specified?  Yuck.  

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: Design issue

2001-02-10 Thread Dan Harkless


Hrvoje Niksic [EMAIL PROTECTED] writes:
 "Dan Harkless" [EMAIL PROTECTED] writes:
  Well, it wouldn't be very tough for them to adjust their archives by
  moving the hostname directories into "ftp" or "http" directories.
  Or else use the option that makes it leave off the protocol.
  
  As long as we document the change clearly, it doesn't seem that bad
  to me.  To me being destructive to files when FTP and WWW servers
  are run on the same machine is a worse evil than temporarily
  confusing some people who don't read the documentation when
  upgrading to Wget 1.7 from an older version.
 
 It's not that someone wouldn't be aware of the change, but that they
 would view the change as gratuitous.  

Hard to say.  If they considered the
file-overwriting-when-ftp-and-www-server-are-on-same-host problem to be
serious, as I and others on the list do, they might not consider it to be
gratuitous.  And again, it won't be tough to get Wget to revert to the old
behavior, if they prefer it.

Perhaps we should take a vote on whether the new with-protocol local
filenames should become the default or not.

 I know I would.  Wget has been behaving like this since day 1, and we
 should have a very compelling reason for changing the default.

Hmm.  Well, haven't we changed a lot of other things that have been the case
since day 1?

I think the failure to properly mirror a server that runs both ftp and http
is "very compelling", but perhaps most people would disagree.  It's true
that this isn't a problem all *that* often, since professional sites almost
always use ftp://ftp.domainname and http://www.domainname, even if "ftp"
and "www" are just aliases for the same machine.  Also, FTP stuff is
*usually* under a "pub/" directory, and WWW content *usually* doesn't have a
directory with that name.

I'm not dead-set against retaining the current default behavior, but I'd
still personally prefer to have it that way.

If we don't end up turning on the protocol directories by default, what
about the non-80 ports in the hostname directories (e.g. site.com/... vs.
site.com:8080/...)?  It seems very wrong to me to put those both in the same
directory by default, and we don't have the "ftp.domainname" and "pub/"
saves that we do in the FTP vs. HTTP case.

   I had different thoughts: I would like have liked to not include
   the host name by default, since hosts are not traversed by default
   anyway.
  
  And only create the hostname directories when -H is specified?
  Yuck.
 
 ``Yuck'' is in the eye of the beholder.

Yeah, but I'm sure most people consider the site name to be an important
piece of info on the local copy.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: Design issue

2001-02-09 Thread Hack Kampbjørn

Herold Heiko wrote:
 
 I think the most straightforward mapping would also be the
 most attractive:
 
 ftp/site/dir/file
 http/site/dir/file
 
 Wget should certainly have an option to make it behave this
 way.  In fact,
 I'd prefer it to behave that way by default, for the reasons
 you mention,
 and introduce an option to leave off the protocol.
 
 
 I agree. What about https ?

What about answering on more than one port like java.sun.com used to do
where :80 had a java menu and :81 not. This is a bad example as it was
mostly the same web-site

 The files could be either in a separate https directory (logically more
 correct) or reside in the http directory in order to minimize
 ../../../../dir/dir/dir/something url rewriting (since I suppose those
 pages could share lots of inline pics and other links with the http
 structure).
 
 Speaking of https, I got exactly one report (in private mail) of
 successfully testing of the windows ssl enabled binary, nothing else.
 
 Could you commit the patch as
 http://www.mail-archive.com/wget@sunsite.dk/msg00142.html ?
 The changes in gen_sslfunc.c could be needed anyway for other operating
 systems (the are mirrored from similar code in sysdep.h and http.c,
 although I just noticed a inconditional include of time.h in
 ftpparse.c), while the changes in the VC makefile are as default
 commented out.
 
 Heiko
 
 --
 -- PREVINET S.p.A.[EMAIL PROTECTED]
 -- Via Ferretto, 1 ph  x39-041-5907073
 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087
 -- ITALY


Hack 8-)



Re: Design issue

2001-02-09 Thread Dan Harkless


Jan Prikryl [EMAIL PROTECTED] writes:
 Quoting Dan Harkless ([EMAIL PROTECTED]):
  I don't see why we would use an '_' instead of a ':' on the second version
  (except on Windows if the ':' character is a no-no there).
 
 The colon is a relict of DOS path notation (C:\) so it cannot
 appear in a filename.

Fine, but I'm not really up for obfuscating the URLs on UNIX just to make
DOS/Windows happy.  Already the Windows port has to deal with more
characters being non-allowed in filenames than on UNIX.  This is just
another one.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: Design issue

2001-02-09 Thread Doug Kaufman

On Fri, 9 Feb 2001, Dan Harkless wrote:

 Jan Prikryl [EMAIL PROTECTED] writes:
  The colon is a relict of DOS path notation (C:\) so it cannot
  appear in a filename.
 
 Fine, but I'm not really up for obfuscating the URLs on UNIX just to make
 DOS/Windows happy.  Already the Windows port has to deal with more
 characters being non-allowed in filenames than on UNIX.  This is just
 another one.

Clearly, the non-unix ports can be modified to deal with incompatible
filenames. I think this is more a question of whether you
intentionally want to create portability problems when creating new
code. There is no question that portability comes at a price. In this
case it is loss of the exact URL in the new filepath. From a systems
viewpoint, it seems much simpler to avoid problem code rather than
assume that someone can create a workaround for those systems where
it doesn't work. This just adds to the complexity of maintaining the
code. The created path is an indication of the origin of the files,
not necessarily a full URL. As noted in other posts, the protocol name
has not previously been included in the path, so this has never been
the full URL. Perhaps I am too used to DOS and Windows ports of unix
programs, but the mentioned alternatives to the colon in the filename
all looked reasonable and intuitive to me. I didn't perceive any
obfuscation.

Doug
__ 
Doug Kaufman
Internet: [EMAIL PROTECTED]