Re: Network transport for rpm depsolving

2007-07-13 Thread Thomas Lotterer
 On Friday, 13. July 2007 at 3:23 pm, [EMAIL PROTECTED] wrote:
 On the feature set of rpm-5.0 is an integrated dependency solver.
 
I assume this is another install-only dependency solver? - new topic

 A dependency solver needs information about the package universe.
 
Which brings up the question which data format the information has.

At
OpenPKG we use a RDF which basically contains a dump of all RPM
metainformation. RDF is XML that means easy to create with print and a
lot of parsers available.  Compression is widely available and useful in
XML environments. We also defined and implemented a hyperlink feature
which allows our RDFs to link to others, e.g. often you find a
handcrafted toplevel RDF [1] pointing to others generated by index
programs [2].

The experiences we collected with this approach over years are good. If
I would have to solve the problem again, I'd consider two things to
improve performance. One is the ability for the data format to carry
checksum information along with links to enable clients to safely
retrieve valid data from their caches, if available. The other
improvement would be a structure that allows partial downloads of index
information - XAR comes to my mind. Further improvements like selective
data retrieval would be possible with online random access methods at
the price of not being supported by all protocols and media, so we opted
not to consider them.

 The information is time sensitive, and is usually not resident on the
 local client.
 
And the information is likely to be used repeatedly which brings up the
topic of caching and cache expiry/validation.

 Which means that an integrated dependency solver must undertake
 transport of the information necessary to perform depsolving.
 
Or call external helpers to execute the actual transport. Or retrieve
from cache.

 The end-point of the discussion will be a choice of network transport
 or a feature reversion for the rpm-5.0 roadmap. No other choice is
 possible imho.
 
Neither fixed choice nor feature reversal is necessary.  An external
helper fed by RPM with URIs could make RPM network transport agnostic.

 The protocol choice is HTTP, duh. No other protocol traverses firewalls.
 
I agree the HTTP choice is generally good today. FTP is awfully broken
by design and should be banned from this universe.  But generally
means there are exceptions and today does not necessarily cover the
future. My HTTP preference also comes from the fact that the protocol
can make use of proxies, caching, authentication, piggybacking out of
band data in the request and response etc. But most of those features
are hard to use with an embedded solution which could only provide a
finite set of features.

My experience is that often RPM is only a (major) part of a deployment
solution and in almost all practical cases other parts of the solution
must also perform network transfers. It turned out to be very confusing
to the user to configure, e.g. the local proxy, twice in completely
different ways. Or find out that both implementations have noticeable
limitations and the lowest common denominator of features is slim. I
remember we needed user/pass authentication and after teaching our
framework to support it we found out RPM supports it with HTTP but not
with FTP.

 The implementation choice is either internal through rpmio, or
 external through a curl/wget/rsync helper invocation.
 
I believe in external helper application usage. In addition to what I
said before, advantages are support for any existing and future
protocol, features like local URI and content rewriting, caching, mirror
support for proximity/speed download, failover and loadbalancing, etc.
Also the legal advantage of fork+exec breaking GPL must not be
underestimated - my understanding is that HTTPS support through OpenSSL
is not appropriate for a published GPL binary executable
(RPM+neon+OpenSSL).

The price is the performance penalty of fork+exec. Especially with a
simple implementation of linear one-by-one fetches. A possible remedy is
to specify a simple local protocol between RPM and the external helper.
Something like a pipeline with a list of URIs where simple external
helper applications fetch one-by-one and more complex ones do parallel
downloads. OK, not really that simple if I consider all the special
cases but you get the idea.

[1] ftp://ftp.openpkg.org/current/00INDEX.rdf 
[2] ftp://ftp.openpkg.org/current/SRC/00INDEX.rdf.bz2 

You asked for it ... :-) 

-- 
http://thomas.lotterer.net
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: Network transport for rpm depsolving

2007-07-13 Thread Mark Hatle
My preference is external helpers.  I'm not sure if curl, wget, and/or
rsync is the right interface though.

I'd like to see something that goes out and says I need the following
information, and then the external interface can then use curl, wget,
rsync, neon, or something custom to do what it wants.  (The default
implemented should be http of course with whatever the standard
protocols decided are..)  but I'd like this modular.

--Mark

Jeff Johnson wrote:
 On the feature set of rpm-5.0 is an integrated dependency solver.
 
 A dependency solver needs information about the package universe.
 
 The information is time sensitive, and is usually not resident on the
 local client.
 
 Which means that an integrated dependency solver must undertake transport
 of the information necessary to perform depsolving.
 
 Since network transport within rpm (I am not the one who added FTP
 transport
 to rpm) has always been controversial, its time to start the discussions
 now.
 
 The end-point of the discussion will be a choice of network transport or
 a feature reversion for the rpm-5.0 roadmap. No other choice is possible
 imho.
 
 The protocol choice is HTTP, duh. No other protocol traverses firewalls.
 
 The implementation choice is either internal through rpmio, or external
 through a curl/wget/rsync helper invocation.
 
 If internal rpmio is used (my preference) then neon becomes mandatory
 and the linking decisions move to whether https through openssl/gnutls
 is necessary underneath neon.
 
 If external helper is chosen, then the ufdio layer in rpmio becomes
 optional
 and external helpers are what will be used for future rpm development.
 
 What say ye?
 
 73 de Jeff
 __
 RPM Package Managerhttp://rpm5.org
 Developer Communication Listrpm-devel@rpm5.org

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: Network transport for rpm depsolving

2007-07-13 Thread Jeff Johnson


On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote:


My preference is external helpers.  I'm not sure if curl, wget, and/or
rsync is the right interface though.



Another external helper vote tallied.


I'd like to see something that goes out and says I need the following
information, and then the external interface can then use curl, wget,
rsync, neon, or something custom to do what it wants.  (The default
implemented should be http of course with whatever the standard
protocols decided are..)  but I'd like this modular.



Better start thinking through how external helpers get executed.

73 de Jeff

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: Network transport for rpm depsolving

2007-07-13 Thread Jay Soffian

On Jul 13, 2007, at 11:43 AM, Jeff Johnson wrote:


On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote:

My preference is external helpers.  I'm not sure if curl, wget,  
and/or

rsync is the right interface though.


Another external helper vote tallied.


+1.

I'd like to see something that goes out and says I need the  
following
information, and then the external interface can then use curl,  
wget,

rsync, neon, or something custom to do what it wants.  (The default
implemented should be http of course with whatever the standard
protocols decided are..)  but I'd like this modular.


+1.


Better start thinking through how external helpers get executed.


I used yum (oh the horror) as part of a custom RPM deployment  
solution but needed it to talk thru a custom proxy. Due to licensing  
concerns, I ended up hacking urlgrabber to fork/exec an external  
process, passing it a URL as a single argument for each network grab.  
The external process returns whatever it gets back from the URL on  
stdout.


When the fork/exec for each network grab becomes the bottleneck in  
this solution, my plan is to just feed the external process URLs via  
a pipe.


So anyway, something similar using rpm would work out nicely.

j.
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: Network transport for rpm depsolving

2007-07-13 Thread Jeff Johnson


On Jul 13, 2007, at 1:02 PM, Jay Soffian wrote:


On Jul 13, 2007, at 11:43 AM, Jeff Johnson wrote:


On Jul 13, 2007, at 11:24 AM, Mark Hatle wrote:

My preference is external helpers.  I'm not sure if curl, wget,  
and/or

rsync is the right interface though.


Another external helper vote tallied.


+1.



Yet another tallied.

The end-point is perfectly understood:
 No transport in rpm libraries.

Since I need to commit to a design course for an integrated
dependency solver Real Soon Now, I will start replacing the
stdio(3) and system call wrappers in rpmio this weekend, replacing
with vectors that can be run-time loaded from a dlopen interface
that remains to be written.

Likely the most important change is the following typedef:

#if defined(HAVE_LIBIO_H)
typedef FD_t FILE *;
#else
typedef FD_t FILE **;
#endif

i.e. a fundamental type will have different levels of indirection
depending on platform.

I'd like to see something that goes out and says I need the  
following
information, and then the external interface can then use curl,  
wget,

rsync, neon, or something custom to do what it wants.  (The default
implemented should be http of course with whatever the standard
protocols decided are..)  but I'd like this modular.


+1.


Better start thinking through how external helpers get executed.


I used yum (oh the horror) as part of a custom RPM deployment  
solution but needed it to talk thru a custom proxy. Due to  
licensing concerns, I ended up hacking urlgrabber to fork/exec an  
external process, passing it a URL as a single argument for each  
network grab. The external process returns whatever it gets back  
from the URL on stdout.


When the fork/exec for each network grab becomes the bottleneck in  
this solution, my plan is to just feed the external process URLs  
via a pipe.


So anyway, something similar using rpm would work out nicely.



This is rpm-devel, not yum-devel nor apt-devel, list.

Feel free to propose a solution for how rpm will use external  
transport helpers.


ATM, I plan on adding the external transport helper invocation  
underneath

this existing routine:

/**
* Copy data from URL to local file.
* @param url   url string of source
* @param dest  file name of destination
* @return  0 on success, otherwise FTPERR_* code
*/
int urlGetFile(const char * url, /[EMAIL PROTECTED]@*/ const char * dest)
/[EMAIL PROTECTED] h_errno, fileSystem, internalState @*/
/[EMAIL PROTECTED] fileSystem, internalState @*/;

in order to start adding explicit calls to urlGetFile() where needed.

73 de Jeff
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org