On Tue, May 04, 2010 at 09:24:07AM +0200, Bertrand Juglas wrote:
> Do you think i can start the process myself ?

Quite possibly.

Let me first list roles in the import process:
* Postgresql Databases:
** Commit repository (where packages are committed to)
** rMake internal ephemeral repository (where packages are built,
   and where the packages come from that are committed to the
   commit repository)
** rMake job database
* Other services:
** Yum repository (preferably available via both NFS and HTTP)
** Conary repository (the web front end that writes to the commit repository)
** Mirrorball front end
** rMake head node
** rMake build nodes

These roles must be implemented on systems with close network
proximity -- that is, they need to be able to access each other
with high bandwidth and low latency.

Since you asked me specifically about expressing this in terms of how
many quad-core Xeon 4GiB memory 1 TiB RAID 1 SATA systems would be
required, I'll give an example of how we could parcel out those roles.

I'd like to make clear that if I were specifying hardware for the
import process without specific restrictions, I'd differentiate
hardware configuration to better meet different roles, and could
achieve the same results or better with fewer systems overall.


I want to describe some potential bottlenecks.

First, we have generally deployed the databases on systems with
more than 4 GiB of memory, so I don't have good information for
you on whether you'll see significantly slower performance with
4GiB of memory for database systems.

Second, mirrorball can use significant amounts of memory to
represent the historical model; we run it on very large systems --
we're currently running it on a system with 32GiB of memory to avoid
having to worry about the size of the model.  We're often working
on multiple imports at once, so that's not a good direct comparison.
For a single import without years of history to represent, 4GiB
should be enough.

Finally, our rule of thumb is that for maximizing rMake build node
performance, you should have 2GB of memory per core.  With only
1GB of memory per core, you'll probably end up with some idle CPU.


So here's a possible configuration based on your suggested hardware:

1 Postgresql commit repository (RAID1)
1 Postgresql ephemeral repository + rMake job database (RAID1)
1 Conary repository + Yum repository (RAID1)
1 Mirrorball front end (RAID1)
1 rMake head node (RAID1)
4 rMake build nodes (RAID0 if possible for faster I/O)

The number of build nodes will basically control how fast you
can build.  The more build nodes you have, the more you stress the
postgresql repositories.  Given that the database systems have only
4 GiB of memory, I don't think I'd deploy more than 4 build nodes.
That's just a rough guess as to scaling, because we haven't built
with exactly this kind of infrastructure before.  If you have only
2 or 3 build nodes, the import will just go a bit slower, and the
database systems will be more lightly loaded.

Anyway, this comes out to at least 7 of the specified systems, and
as many as 9.

Again, this configuration describes how to deploy a set of specified
machines, not necessarily an optimal configuration, where some
machines would be configured a bit differently.  This email is not
general advice on how to set up a mirrorball import cluster.

Thanks, and looking forward to the possibility of getting this
underway!
_______________________________________________
Foresight-devel mailing list
Foresight-devel@lists.rpath.org
http://lists.rpath.org/mailman/listinfo/foresight-devel

Reply via email to