Okay, as my message seems to have bounced from devel, I'll try again. I believe that hijacking an existing, standard, protocol (here http) is a bad idea. Firstly, because turning standard protocols into hack versions that sometimes work as per the standard and sometimes don't is usually opening a can of worms. Who knows what that is going to break ? Who can tell exactly what depends on the correct behaviour of http ? Not to mention that there's a whole bunch of things you need to reimplement by yourselves if you want to use http. In practice, from my experience of Mozilla's source code, you'll have to reimplement caching, authentification, redirections, etc. That's quite the opposite of the expected benefits of hijacking.
Secondly, hacking http means that you rely only on http. That's good if you only want to download books from http servers. But what about the other important protocols such as file: ? Are you also going to hack file: ? What about ftp: ? Thirdly, using a hacked http: (or file: or ftp:) means the subtle yet annoying problem of referencing resources (say particular pages or images) inside a book. In http://mydomain.org/a/b/c/d what part of a/b/c/is the directory containing the book ? What part is the identifier of the book ? What part is the name of the resource ? What if resource names involve directories ? Etc. Sure, you can solve the problem by using smart conventions on URLs or by toying with exclamation marks, interrogation marks and sharp signs but I suspect that you'll quickly end up with having to hack the very notion of URL away from what's used in http: . And it gets worse if your books may be generated or delivered dynamically -- hence involving interrogation marks for queries -- or if some resources inside the book may be generated or delivered dynamically or, even worse, if books may contain books. By opposition, the library: protocol * doesn't break anything * can work together with any delivery protocol (we're using mostly http: and file:, but also jar: for decompression and we hope we'll be able to use some peer-to-peer protocol in the future for distributed libraries) * already takes advantage of Mozilla's caching * resolves ambiguities between book identifier / resources inside the book / book inside book / etc. The one downside I see about library: is the possibility of having two different books with the same "unique" identifier. And I'm confident there's a way to find workarounds. Perhaps by making the *identifier* -- and only the identifier -- a nicely encoded URL. Say, something like library:mydomain.org!a!b/c/d being automatically turned into http://mydomain.org/a/b for downloading/authentification purposes. Here, I assume that http is the default downloading mechanism. Other non-default protocols may be specified. Note that I'm avoiding %-based encodings only for readability purposes. If readability is not a problem, we can use directly that standard encoding. Cheers, David On Mon, 2007-07-09 at 23:42 -0400, Samuel Klein wrote: > > > >>> I'd rather hijack http: for the same thing you are doing, but I get > > > >>> the > > > >>> impression creating a new protocol is relatively simple in comparison. > > > > With library: you are keying books off ids. http: is just keying books > > > > off > > > > the URI, which is a string just like the id is a string. It's okay to > > > > just > > > > treat it as a string. > > > > I would also rather see us use http:// as our protocol scheme. Http > > seems to answer three of the above questions: > > a) who owns the identifier 'http://cscott.net/ElectronicsTextbook' > > the people behind cscott.net, of course. this prevents id > > duplication. > > b) what happens if I don't actually have the ElectronicsTextbook on my > > machine > > the URI gives you a location where you can download it. > > (although we have a lot of flexibility about what content gets > > served from that URL -- > > it could just be a redirection or metadata of some kind) > > c) how do I tell if my ElectronicTextbook is the "real" > > ElectronicsTextbook > > I can always compare it to the canonical version, using (for > > example) http etags. > > > > I haven't heard the counter-argument for 'library:', so maybe I'm > > missing some compelling reason to invent our own protocol, but this > > seems like another case where we should be reusing rather than > > reinventing. > > --scott > > Using http:// does have great advantages. I'm not sure ambiguity of > identifiers is one of our top problems... URIs are good at being > unique, as noted above; and you can never completely solve the problem > of people cloning materials and making new copies with new names that > are identical to the old materials. It is useful to take advantage of > caching already in place for http:// . And it is useful to be able to > view with a reader any accessible material with a URL, without special > preprocessing or database seeding. > > SJ _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
