Hello all,
At UMBC we were given a grant to run a ceph cluster as our primary research
storage for an HPC facility. The ceph cluster consists of the following
hardware:
Mon” nodes x3:
• 2x 25Gb interfaces
• 192GB RAM
• Storage:
-- 0 (none) OSDs/drives
“HDD” nodes x16:
• 2x 25Gb interfaces
• 192GB RAM
• Storage:
-- 12 OSDs w/ 20TB HDDs
-- 2 local Write-Access-Logs (journal) w/ 8TB NVMe drives
“NVMe” nodes x3:
• 2x 100Gb interfaces
• 384GB RAM
• Storage:
-- 16x OSDs w/ 8TB NVMe drives
“MDS” nodes x3:
• 2x 100Gb interfaces
• 192GB RAM
• Storage:
-- 8x OSDs w/ 1TB NVMe drives


In the six months since moving from a different storage solution (isilon),
we've had multiple crashes on the system that have completely taken down
our system as a result of multiple thousands of jobs being run at the same
time, causing an overload on the active MDS nodes. Has the use of ceph as a
primary research storage been used in an HPC with many dozens of different
workflows and filetypes being used at once? Or is a better to design to
come up with a tiered approach in combination with a second storage
solution?

-- 
V/R,
Maxwell Breitmeyer
UMBC HPCF Specialist
Graduate Student
(443) 835-8250
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to