James would you be able to share more info about your setup ? 1. What exactly is your application doing which requires so much memory and CPU - is it something like gene splicing (no i don't know much about it beyond Jurassic Park :D ) 2. Do you feel Perl was the best choice for whatever you are doing and if yes then why ? How much of your stuff is using mod_perl considering you mentioned not much is web related ? 3. What are the challenges you are currently facing with your implementation ?
On Wed, Dec 23, 2020 at 6:58 AM James Smith <j...@sanger.ac.uk> wrote: > Oh but memory is a problem – but not if you have just a small cluster of > machines! > > Our boxes are larger than that – but they all run virtual machine {only a > small proportion web related} – machines/memory would rapidly become in our > data centre - we run VMWARE [995 hosts] and openstack [10,000s of hosts] + > a selection of large memory machines {measured in TBs of memory per machine > }. > > We would be looking at somewhere between 0.5 PB and 1 PB of memory – not > just the price of buying that amount of memory - for many machines we need > the fastest memory money can buy for the workload, but we would need a lot > more CPUs then we currently have as we would need a larger amount of > machines to have 64GB virtual machines {we would get 2 VMs per host. We > currently have approx. 1-2000 CPUs running our hardware (last time I had a > figure) – it would probably need to go to approximately 5-10,000! > It is not just the initial outlay but the environmental and financial cost > of running that number of machines, and finding space to run them without > putting the cooling costs through the roof!! That is without considering > what additional constraints on storage having the extra machines may have > (at the last count a year ago we had over 30 PBytes of storage on side – > and a large amount of offsite backup. > > We would also stretch the amount of power we can get from the national > grid to power it all - we currently have 3 feeds from different part of the > national grid (we are fortunately in position where this is possible) and > the dedicated link we would need to add more power would be at least 50 > miles long! > > So - managing cores/memory is vitally important to us – moving to the > cloud is an option we are looking at – but that is more than 4 times the > price of our onsite set-up (with substantial discounts from AWS) and would > require an upgrade of our existing link to the internet – which is > currently 40Gbit of data (I think). > > Currently we are analysing a very large amounts of data directly linked to > the current major world problem – this is why the UK is currently being > isolated as we have discovered and can track a new strain, in near real > time – other countries have no ability to do this – we in a day can and do > handle, sequence and analyse more samples than the whole of France has > sequenced since February. We probably don’t have more of the new variant > strain than in other areas of the world – it is just that we know we have > because of the amount of sequencing and analysis that we in the UK have > done. > > > > *From:* Matthias Peng <pengmatth...@gmail.com> > *Sent:* 23 December 2020 12:02 > *To:* mod_perl list <modperl@perl.apache.org> > *Subject:* Re: Confused about two development utils [EXT] > > > > Today memory is not serious problem, each of our server has 64GB memory. > > > > > Forgot to add - so our FCGI servers need a lot (and I mean a lot) more > memory than the mod_perl servers to serve the same level of content (just > in case memory blows up with FCGI backends) > > -----Original Message----- > From: James Smith <j...@sanger.ac.uk> > Sent: 23 December 2020 11:34 > To: André Warnier (tomcat/perl) <a...@ice-sa.com>; modperl@perl.apache.org > Subject: RE: Confused about two development utils [EXT] > > > > This costs memory, and all the more since many perl modules are not > thread-safe, so if you use them in your code, at this moment the only safe > way to do it is to use the Apache httpd prefork model. This means that each > Apache httpd child process has its own copy of the perl interpreter, which > means that the memory used by this embedded perl interpreter has to be > counted n times (as many times as there are Apache httpd child processes > running at any one time). > > This isn’t quite true - if you load modules before the process forks then > they can cleverly share the same parts of memory. It is useful to be able > to "pre-load" core functionality which is used across all functions {this > is the case in Linux anyway}. It also speeds up child process generation as > the modules are already in memory and converted to byte code. > > One of the great advantages of mod_perl is Apache2::SizeLimit which can > blow away large child process - and then if needed create new ones. This is > not the case with some of the FCGI solutions as the individual processes > can grow if there is a memory leak or a request that retrieves a large > amount of content (even if not served), but perl can't give the memory > back. So FCGI processes only get bigger and bigger and eventually blow up > memory (or hit swap first) > > > > > > -- > The Wellcome Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston > Road, London, NW1 2 [google.com] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_maps_search_s-2B215-2BEuston-2BRoad-2C-2BLondon-2C-2BNW1-2B2-3Fentry-3Dgmail-26source-3Dg&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=oH2yp0ge1ecj4oDX0XM7vQ&m=friR8ykiZ-NWYdX6SrbT_ogNXEVR-4ixdkrhy5khQjA&s=xU3F4xE2ugQuDWHZ4GtDn9mPBCKcJJOI0PYScsSNjSg&e=> > BE. > > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2 [google.com] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_maps_search_s-2B215-2BEuston-2BRoad-2C-2BLondon-2C-2BNW1-2B2-3Fentry-3Dgmail-26source-3Dg&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=oH2yp0ge1ecj4oDX0XM7vQ&m=friR8ykiZ-NWYdX6SrbT_ogNXEVR-4ixdkrhy5khQjA&s=xU3F4xE2ugQuDWHZ4GtDn9mPBCKcJJOI0PYScsSNjSg&e=> > BE. > > -- The Wellcome Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. >