Here, we deploy some clusters on OpenStack, and some traditionally as bare
metal. Our largest cluster is actually a mixture of both, so we can
dynamically expand it from the OpenStack service when needed.
Our aim eventually is to use OpenStack as a common deployment layer, even for
the bare metal cluster nodes, but we’re not quite there yet.
The main motivation for this was to have a common hardware and deployment
platform, and have flexibility for VM and batch workloads. We have needed to
dynamically change workloads (for example in the current COVID-19 crisis, our
human sequencing has largely stopped and we’ve been predominantly COVID-19
sequencing, using an imported pipeline from the consortium we’re part of).
Using OpenStack we could get that new pipeline running in under a week, and
later moved it from the research to the production environment, reallocating
research resources back to their normal workload.
There certainly are downsides; OpenStack is a considerable layer of complexity,
and we have had occasional issues, although those rarely affect established
running VMs (such as batch clusters). Those occasional problems are usually in
the services for dynamically creating and destroying resources, so they don’t
have immediate impact on batch clusters. Plus, we tend to use fairly static
provider networks to connect the Lustre systems to virtual clusters, which
removes another layer of OpenStack complexity.
Generally speaking it’s working pretty well, and we have uptimes of in excess
of 99.5%
Tim
On 1 Jul 2020, at 05:09, John Hearns
<hear...@gmail.com<mailto:hear...@gmail.com>> wrote:
Jorg, I would back up what Matt Wallis says. What benefits would Openstack
bring you ?
Do you need to set up a flexible infrastructure where clusters can be created
on demand for specific projects?
Regarding Infiniband the concept is SR-IOV. This article is worth reading:
https://docs.openstack.org/neutron/pike/admin/config-sriov.html
[docs.openstack.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_neutron_pike_admin_config-2Dsriov.html&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=VMHyCkd1eb1ztnzu4i617zrYxnddfDUUEkn1u45xQq0&e=>
I would take a step back and look at your storage technology and which is the
best one to be going forward with.
Also look at the proceeding sof the last STFC Computing Insights where Martyn
Guest presented a lot of
benchmarking results on AMD Rome
Page 103 onwards in this report
http://purl.org/net/epubs/manifestation/46387165/DL-CONF-2020-001.pdf
[purl.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__purl.org_net_epubs_manifestation_46387165_DL-2DCONF-2D2020-2D001.pdf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=GNtI2S6yacqAS4bpUYbfq4bDe8nv9gXksMXaqCqgbro&e=>
On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen
<sassy-w...@sassy.formativ.net<mailto:sassy-w...@sassy.formativ.net>> wrote:
Dear all,
we are currently planning a new cluster and this time around the idea was to
use OpenStack for the HPC part of the cluster as well.
I was wondering if somebody has some first hand experiences on the list here.
One of the things we currently are not so sure about it is InfiniBand (or
another low latency network connection but not ethernet): Can you run HPC jobs
on OpenStack which require more than the number of cores within a box? I am
thinking of programs like CP2K, GROMACS, NWChem (if that sounds familiar to
you) which utilise these kind of networks very well.
I cam across things like MagicCastle from Computing Canada but as far as I
understand it, they are not using it for production (yet).
Is anybody on here familiar with this?
All the best from London
Jörg
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> sponsored
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
[beowulf.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e=>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> sponsored
by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e=
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf