----- [email protected] wrote:

> > -----Original Message-----
> > From: [email protected]
> [mailto:caiman-discuss-
> > [email protected]] On Behalf Of Shawn Walker
> > Sent: 11 March 2011 19:32
> > To: Robert Milkowski
> > Cc: [email protected]
> > Subject: Re: [caiman-discuss] Trully hands-free installations
> > 
> > On 03/11/11 07:59 AM, Robert Milkowski wrote:
> > ...
> > > 3. pkg performance
> > >
> > >      The solaris.zlib downloads at about 100MB/s on a GbE network
> -
> good.
> > >      However then pkg starts downloading packages and the network
> > utilizations
> > >      varies between 0.5MB - 30MBs with an average less than a
> couple of
> > MB/s.
> > >      I guess the sporadic 15-30MB/s occurrences are for some
> large
> files,
> > otherwise
> > >      the performance is abysmal and it takes far too long to just
> transfer
> > packages.
> > >      Not to mention that entire process is basically serialized
> and
> doesn't
> > make
> > >      much use of additional cores on a server. Is there a way for
> pkg to
> > download
> > >      multiple files at the same time? This could probably help a
> little
> bit...
> > >      It doesn't have to be able to saturate a GbE link but doing
> less
> than 5% is
> > far
> > >      from being impressive.
> > 
> > Actually, pkg(1) makes 20 connections to a package server at a time
> for
> > content, so it's only "serial" in the sense that one package is
> retrieved
> at a
> > time.
> > 
> > However, pkg retrieves individual files for a package, not a giant
> blob.
> >   This does mean that transfer time may be slower than if entire
> packages
> > were transferred at a time, but it greatly minimises the amount of
> bytes
> > transferred because of variants, facets, and updates (since only the
> files
> that
> > are changed are transferred for updates).
> 
> Why not to fetch multiple packages at the same time as well?

Because we're already opening 20 parallel connections to the server at a time, 
so that's not a good idea.  It would help in some situations, but it is unclear 
how much (I wouldn't expect it to be significant).

> Then perhaps there should be an image install concept similar to
> flash
> archives. This could greatly improve performance.

If such functionality is provided, it won't be by the packaging system.

> Or maybe pkg/depo should be able to pre-compute images of defined
> packages
> so then instead of transferring file-by-file entire image would be
> transferred.

It cannot; the client has to determine what is wants to retrieve since only the 
client has a picture of all of the possible packages involved in the operation. 
 Remember that the server is also relatively dumb, it could actually be an 
Apache process; not pkg.depotd(1M).

> Something like: create a meta package called server-core and then
> 'pkg
> compute-image server-core' which would create a tar-like archive on
> the
> server. Then if a client needs a full copy of the core-server
> meta-package
> it could negotiate with server and transfer pre-computed images
> instead of a
> file-by-file for each package. For ad-hoc package install/upgrade it
> could
> transfer it using the current method. When an image is created one
> should be
> able to specify what should be included in it (what
> architectures/facets,
> etc.).

This is actually what the package server used to do, but it turned out it was 
the wrong answer for a number of reasons I really don't want to take the time 
to get into.

With that said, long-term we'll be looking into the possibility of using 
pre-generated package archives for installs, but given that with the right 
configuration you already have very good performance, it hasn't been a 
priority.  And there's more performance work to be done yet.
  
> In enterprise installation it is usually more important to get quickly
> an
> initial OS install rather than a single package later on. And if

Actually, I'd say the opposite.  Its been my personal finding that most users 
care far more about the time required for updates on already deployed systems 
than initial provisioning.  While I agree initial provisioning should be 
"reasonably fast", updates are actually more important.

... 
> > Another thing to consider is that if you are using pkg.depotd and
> want
> better
> > scalability or performance, you could export the repository via an
> NFS
> share
> > instead, or place an Apache reverse caching proxy in front of
> pkg.depotd.
> 
> Why would it make things faster? Shouldn't a single depotd be at least
> as
> fast as exporting the same repository over nfs or putting apache in
> front of
> it?

pkg.depotd(1M) is essentially a web service.  Like all web services, it 
requires tuning and considering for deployment.

To add that, the HTTP protocol has significant overhead compared to NFS, etc. 
plus the performance characteristics are totally different (the OS provides 
NFS, while pkg.depotd is an application).

As for Apache being faster, yes, that's expected with almost any web 
application.  Apache was designed and architected for very specific purposes, 
and serving files happens to be one it's very good at.  pkg.depotd(1M) on the 
other hand, was primarily intended as a publication server for packaging 
processes, and as an easy way to share package data when HTTP was the most 
appropriate method.

Keep in mind that pkg.depotd(1M) does provide services such as remote search, 
and a browser user interface for examining package data that aren't suitable 
for a static web server.

Over time, it's become clear that pkg.depotd(1M) is not the right service to 
share package data if you need the sort of large-scale provisioning typically 
done with AI.  As a result, it's likely that example configuration for using 
Apache to serve package data will be provided (we already have an example in 
the gate for caching and web proxying).

> And to be honest - I don't have to deploy proxy servers or go to other
> means
> to get decent (*much* better) performance when using Linux's kickstart
> to
> get an OS installed over a network. It just works. Frankly  AI+pkg
> should be
> able to easily saturate GbE on modern x86 hardware out-of-the-box - if
> they
> can't then they are broken.

I'm sorry, but I'm going to have to disagree with your metrics since they fail 
to take into account all of the additional functionality that our package 
system offers that they do not.

Realistically, pkg(5) has a very different architecture that requires a 
different approach to deployment.

Yes, the goal certainly is to have reasonable performance with little to no 
configuration, but many deployments are going to require some thought.

I had already offered you one easy option of just exporting your repository via 
NFS and using that.

-Shawn
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to