On Tue, May 04, 2010 at 09:24:07AM +0200, Bertrand Juglas wrote: > Do you think i can start the process myself ?
Quite possibly. Let me first list roles in the import process: * Postgresql Databases: ** Commit repository (where packages are committed to) ** rMake internal ephemeral repository (where packages are built, and where the packages come from that are committed to the commit repository) ** rMake job database * Other services: ** Yum repository (preferably available via both NFS and HTTP) ** Conary repository (the web front end that writes to the commit repository) ** Mirrorball front end ** rMake head node ** rMake build nodes These roles must be implemented on systems with close network proximity -- that is, they need to be able to access each other with high bandwidth and low latency. Since you asked me specifically about expressing this in terms of how many quad-core Xeon 4GiB memory 1 TiB RAID 1 SATA systems would be required, I'll give an example of how we could parcel out those roles. I'd like to make clear that if I were specifying hardware for the import process without specific restrictions, I'd differentiate hardware configuration to better meet different roles, and could achieve the same results or better with fewer systems overall. I want to describe some potential bottlenecks. First, we have generally deployed the databases on systems with more than 4 GiB of memory, so I don't have good information for you on whether you'll see significantly slower performance with 4GiB of memory for database systems. Second, mirrorball can use significant amounts of memory to represent the historical model; we run it on very large systems -- we're currently running it on a system with 32GiB of memory to avoid having to worry about the size of the model. We're often working on multiple imports at once, so that's not a good direct comparison. For a single import without years of history to represent, 4GiB should be enough. Finally, our rule of thumb is that for maximizing rMake build node performance, you should have 2GB of memory per core. With only 1GB of memory per core, you'll probably end up with some idle CPU. So here's a possible configuration based on your suggested hardware: 1 Postgresql commit repository (RAID1) 1 Postgresql ephemeral repository + rMake job database (RAID1) 1 Conary repository + Yum repository (RAID1) 1 Mirrorball front end (RAID1) 1 rMake head node (RAID1) 4 rMake build nodes (RAID0 if possible for faster I/O) The number of build nodes will basically control how fast you can build. The more build nodes you have, the more you stress the postgresql repositories. Given that the database systems have only 4 GiB of memory, I don't think I'd deploy more than 4 build nodes. That's just a rough guess as to scaling, because we haven't built with exactly this kind of infrastructure before. If you have only 2 or 3 build nodes, the import will just go a bit slower, and the database systems will be more lightly loaded. Anyway, this comes out to at least 7 of the specified systems, and as many as 9. Again, this configuration describes how to deploy a set of specified machines, not necessarily an optimal configuration, where some machines would be configured a bit differently. This email is not general advice on how to set up a mirrorball import cluster. Thanks, and looking forward to the possibility of getting this underway! _______________________________________________ Foresight-devel mailing list Foresight-devel@lists.rpath.org http://lists.rpath.org/mailman/listinfo/foresight-devel