Re: Network transport for rpm depsolving
On Friday, 13. July 2007 at 3:23 pm, [EMAIL PROTECTED] wrote: On the feature set of rpm-5.0 is an integrated dependency solver. I assume this is another install-only dependency solver? - new topic A dependency solver needs information about the package universe. Which brings up the question which data format the information has. At OpenPKG we use a RDF which basically contains a dump of all RPM metainformation. RDF is XML that means easy to create with print and a lot of parsers available. Compression is widely available and useful in XML environments. We also defined and implemented a hyperlink feature which allows our RDFs to link to others, e.g. often you find a handcrafted toplevel RDF [1] pointing to others generated by index programs [2]. The experiences we collected with this approach over years are good. If I would have to solve the problem again, I'd consider two things to improve performance. One is the ability for the data format to carry checksum information along with links to enable clients to safely retrieve valid data from their caches, if available. The other improvement would be a structure that allows partial downloads of index information - XAR comes to my mind. Further improvements like selective data retrieval would be possible with online random access methods at the price of not being supported by all protocols and media, so we opted not to consider them. The information is time sensitive, and is usually not resident on the local client. And the information is likely to be used repeatedly which brings up the topic of caching and cache expiry/validation. Which means that an integrated dependency solver must undertake transport of the information necessary to perform depsolving. Or call external helpers to execute the actual transport. Or retrieve from cache. The end-point of the discussion will be a choice of network transport or a feature reversion for the rpm-5.0 roadmap. No other choice is possible imho. Neither fixed choice nor feature reversal is necessary. An external helper fed by RPM with URIs could make RPM network transport agnostic. The protocol choice is HTTP, duh. No other protocol traverses firewalls. I agree the HTTP choice is generally good today. FTP is awfully broken by design and should be banned from this universe. But generally means there are exceptions and today does not necessarily cover the future. My HTTP preference also comes from the fact that the protocol can make use of proxies, caching, authentication, piggybacking out of band data in the request and response etc. But most of those features are hard to use with an embedded solution which could only provide a finite set of features. My experience is that often RPM is only a (major) part of a deployment solution and in almost all practical cases other parts of the solution must also perform network transfers. It turned out to be very confusing to the user to configure, e.g. the local proxy, twice in completely different ways. Or find out that both implementations have noticeable limitations and the lowest common denominator of features is slim. I remember we needed user/pass authentication and after teaching our framework to support it we found out RPM supports it with HTTP but not with FTP. The implementation choice is either internal through rpmio, or external through a curl/wget/rsync helper invocation. I believe in external helper application usage. In addition to what I said before, advantages are support for any existing and future protocol, features like local URI and content rewriting, caching, mirror support for proximity/speed download, failover and loadbalancing, etc. Also the legal advantage of fork+exec breaking GPL must not be underestimated - my understanding is that HTTPS support through OpenSSL is not appropriate for a published GPL binary executable (RPM+neon+OpenSSL). The price is the performance penalty of fork+exec. Especially with a simple implementation of linear one-by-one fetches. A possible remedy is to specify a simple local protocol between RPM and the external helper. Something like a pipeline with a list of URIs where simple external helper applications fetch one-by-one and more complex ones do parallel downloads. OK, not really that simple if I consider all the special cases but you get the idea. [1] ftp://ftp.openpkg.org/current/00INDEX.rdf [2] ftp://ftp.openpkg.org/current/SRC/00INDEX.rdf.bz2 You asked for it ... :-) -- http://thomas.lotterer.net __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: Network transport for rpm depsolving
My preference is external helpers. I'm not sure if curl, wget, and/or rsync is the right interface though. I'd like to see something that goes out and says I need the following information, and then the external interface can then use curl, wget, rsync, neon, or something custom to do what it wants. (The default implemented should be http of course with whatever the standard protocols decided are..) but I'd like this modular. --Mark Jeff Johnson wrote: On the feature set of rpm-5.0 is an integrated dependency solver. A dependency solver needs information about the package universe. The information is time sensitive, and is usually not resident on the local client. Which means that an integrated dependency solver must undertake transport of the information necessary to perform depsolving. Since network transport within rpm (I am not the one who added FTP transport to rpm) has always been controversial, its time to start the discussions now. The end-point of the discussion will be a choice of network transport or a feature reversion for the rpm-5.0 roadmap. No other choice is possible imho. The protocol choice is HTTP, duh. No other protocol traverses firewalls. The implementation choice is either internal through rpmio, or external through a curl/wget/rsync helper invocation. If internal rpmio is used (my preference) then neon becomes mandatory and the linking decisions move to whether https through openssl/gnutls is necessary underneath neon. If external helper is chosen, then the ufdio layer in rpmio becomes optional and external helpers are what will be used for future rpm development. What say ye? 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: Network transport for rpm depsolving
On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote: My preference is external helpers. I'm not sure if curl, wget, and/or rsync is the right interface though. Another external helper vote tallied. I'd like to see something that goes out and says I need the following information, and then the external interface can then use curl, wget, rsync, neon, or something custom to do what it wants. (The default implemented should be http of course with whatever the standard protocols decided are..) but I'd like this modular. Better start thinking through how external helpers get executed. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: Network transport for rpm depsolving
On Jul 13, 2007, at 11:43 AM, Jeff Johnson wrote: On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote: My preference is external helpers. I'm not sure if curl, wget, and/or rsync is the right interface though. Another external helper vote tallied. +1. I'd like to see something that goes out and says I need the following information, and then the external interface can then use curl, wget, rsync, neon, or something custom to do what it wants. (The default implemented should be http of course with whatever the standard protocols decided are..) but I'd like this modular. +1. Better start thinking through how external helpers get executed. I used yum (oh the horror) as part of a custom RPM deployment solution but needed it to talk thru a custom proxy. Due to licensing concerns, I ended up hacking urlgrabber to fork/exec an external process, passing it a URL as a single argument for each network grab. The external process returns whatever it gets back from the URL on stdout. When the fork/exec for each network grab becomes the bottleneck in this solution, my plan is to just feed the external process URLs via a pipe. So anyway, something similar using rpm would work out nicely. j. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: Network transport for rpm depsolving
On Jul 13, 2007, at 1:02 PM, Jay Soffian wrote: On Jul 13, 2007, at 11:43 AM, Jeff Johnson wrote: On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote: My preference is external helpers. I'm not sure if curl, wget, and/or rsync is the right interface though. Another external helper vote tallied. +1. Yet another tallied. The end-point is perfectly understood: No transport in rpm libraries. Since I need to commit to a design course for an integrated dependency solver Real Soon Now, I will start replacing the stdio(3) and system call wrappers in rpmio this weekend, replacing with vectors that can be run-time loaded from a dlopen interface that remains to be written. Likely the most important change is the following typedef: #if defined(HAVE_LIBIO_H) typedef FD_t FILE *; #else typedef FD_t FILE **; #endif i.e. a fundamental type will have different levels of indirection depending on platform. I'd like to see something that goes out and says I need the following information, and then the external interface can then use curl, wget, rsync, neon, or something custom to do what it wants. (The default implemented should be http of course with whatever the standard protocols decided are..) but I'd like this modular. +1. Better start thinking through how external helpers get executed. I used yum (oh the horror) as part of a custom RPM deployment solution but needed it to talk thru a custom proxy. Due to licensing concerns, I ended up hacking urlgrabber to fork/exec an external process, passing it a URL as a single argument for each network grab. The external process returns whatever it gets back from the URL on stdout. When the fork/exec for each network grab becomes the bottleneck in this solution, my plan is to just feed the external process URLs via a pipe. So anyway, something similar using rpm would work out nicely. This is rpm-devel, not yum-devel nor apt-devel, list. Feel free to propose a solution for how rpm will use external transport helpers. ATM, I plan on adding the external transport helper invocation underneath this existing routine: /** * Copy data from URL to local file. * @param url url string of source * @param dest file name of destination * @return 0 on success, otherwise FTPERR_* code */ int urlGetFile(const char * url, /[EMAIL PROTECTED]@*/ const char * dest) /[EMAIL PROTECTED] h_errno, fileSystem, internalState @*/ /[EMAIL PROTECTED] fileSystem, internalState @*/; in order to start adding explicit calls to urlGetFile() where needed. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org