[oops... Phillip gave me an OK to copy gpc-dev, but then I fat-fingered it...]
This is a good question. We've discussed it with IOWA and UNMC, and I think Russ got another inquiry more recently. (I'm copying Russ, Mani, and Steve who are also involved in hardware and budget discussions.) I can speak to 1. HERON hardware, where I have hard data, but the requirements go beyond what I expect we'll need for GPC 2. Plans for GPC, which I'm just starting to get my head around 3. Recent tinkering with i2b2 on AWS Regarding HERON, recall the HERON architecture<https://informatics.kumc.edu/work/wiki/HERON#Architecture>: [HERON arch from Russ's msg of 07/12/2010 09:48:12 AM]<https://informatics.kumc.edu/work/attachment/wiki/HERON/heron-arch.jpg> We have two hardware servers (id, deid) and a virtual app server (jboss, web UI). The app server is a relatively unremarkable VM: 8GB RAM, 20GB disk. You could perhaps get by with less, if you knew more than I do about how to keep jboss/JVM from eating crazy amounts of RAM. We seem to be using ~80% of that disk space, though I'm not sure how. Log files, maybe. (Did I mention we have open positions for DBA and systems engineer?) The HP DL180 was our 1st generation hardware. We did a sizing review in 2013; I just tweaked the summary spreadsheet to make sense to this audience: * HERON Sizing 2013<https://docs.google.com/spreadsheet/ccc?key=0Ak2nuw10QdWQdDM5THo4WDQzTUJpLVJXUmppcUFCNnc&usp=sharing> As shown there, the Gen2 hardware servers are $55K each. The Gen1 servers were originally more like $20K. That was sort of OK for 1 user, but it was pretty sluggish if anybody else was also using it. So we added RAM and solid state storage, which made performance acceptable for our user-base. This is where requirements for HERON go so far beyond what I can see for GPC. For GPC, strictly speaking, I think we need to run 3 queries in the first six months, one for each cohort we're characterizing. Of course, there are countless iterations to get there, so there's a trade-off between development time and hardware cost. But it's not like HERON, where we aim to support hundreds of queries by dozens of researchers every month... Queries by Month Year-Month Queries Users 2014-01 396 30 2013-12 405 33 2013-11 621 43 2013-10 1164 42 2013-09 1008 35 2013-08 1157 52 2013-07 641 36 2013-06 299 21 ... with response times of 5 seconds to 5 minutes: Created Status Name User Groups Terms Elapsed January 24 10:27:43am INCOMPLETE ... 1 32 0:00:50 ****** January 23 04:41:10pm INCOMPLETE Patient list 1 1 17:47:23 **************** January 23 04:41:10pm INCOMPLETE 1 1 17:47:23 **************** January 24 10:18:09am COMPLETED 3 5 0:01:31 ******* January 24 10:12:59am COMPLETED 2 4 0:01:24 ******* January 24 10:12:03am COMPLETED 1 3 0:00:11 **** January 24 08:58:39am COMPLETED Patient list 2 7 0:00:15 ***** January 24 08:58:39am COMPLETED 2 7 0:00:15 ***** January 24 04:53:00am COMPLETED 1 1 0:00:01 ** January 23 10:52:51pm COMPLETED 1 1 0:00:01 ** January 23 05:31:30pm COMPLETED Patient list 2 21 0:00:12 **** January 23 05:31:30pm COMPLETED 2 21 0:00:12 **** January 23 04:52:41pm COMPLETED 1 1 0:00:01 ** A separate app server makes sense, yes... though at UNMC, I believe they're planning to use the same hardware for the deid DB and the app server, perhaps virtualizing the app server. I heard from Russ that while popmednet is required for exchange between CDRNs, it's not required within CDRNs, and other CDRNs have alternative plans. In other words, I expect we'll need one popmednet node for the GPC CDRN, not one for each GPC site. Actually, about tinkering on AWS, I don't think my experience so far sheds much light, so I'll leave that to a future discussion. I'm adding this to the hackathon agenda<http://informatics.gpcnetwork.org/trac/Project/wiki/HackathonOne#Agenda> (but anyone who has input at this point, please share it here and don't wait until then). -- Dan ________________________________ From: Phillip Reeder [[email protected]] Sent: Friday, January 24, 2014 8:39 AM To: Dan Connolly Subject: Hardware Reference for GPC Do we have a reference as to the hardware/vm's that we expect to need for the GPC? We have our existing DB server and i2b2 app server and web client, but I'm assuming we will probably need to put this on it's own app server and i2b2 web client. Also, will we need a popmednet server for each site? Etc. I'm getting some questions about budget and justifying the VMs so I was wondering if this might be something that the GPC level could help with. Given that it's not really well defined at the moment, I don't want to have them re-budget the money to something else, then in 6 months really need 1 more VM. Have you thought much about what servers each GPC site will need? Phillip ________________________________ UT Southwestern Medical Center The future of medicine, today.
