I have used a few different instance types for Fluo and Accumulo release testing. I don't have any recommendations, but I can share what I have done.
I used to use m1.large instances for Accumulo testing because of the low price and large amount of local instance storage. However, with recent AMIs (like Centos 7) dropping support for PVM, I have stopped using these. For the most recent Accumulo scale testing I did, I used d2.xlarge instances (which have lots of local storage). These are more expensive so I used less of them. I run the Accumulo continuous ingest and random walk cluster test suites on these EC2 clusters. Both generate lots of random data. For Fluo testing I have been using m3.xlarge instances recently. These only have 80G of local SSD storage. I am interested in running test with other instance types like i2.xlarge. For Fluo testing I run the stress test which generates random data and Webindex which uses real data from Common Crawl. Even though the performance per node is not so great, I like using the cheaper m1.large and m3.xlarge instance types because I can get more nodes for less. This is nice for finding issue that only occur at scale. I have not looked into using m4 nodes because they have no local instance storage. A few years ago I did experiments with Accumulo using EBS vs local instance storage and found a large difference in performance. I have not revisted that. I have also not looked into using S3 for HDFS. If anyone has any info on EBS vs S3 vs instance storage for S3 let me know. For my purposes, running test for a few days, I don't care if the data goes away when I terminate the cluster. We created a project called Zetten[1] to automate setting up Accumulo and Fluo on EC2. I use Zetten for all of my Accumulo and Fluo testing now. [1]: https://github.com/fluo-io/zetten Keith On Tue, May 10, 2016 at 1:24 PM, Adina Crainiceanu <[email protected]> wrote: > Hi, > > Does anyone have any advice for what type of Amazon EC2 instance I could > use to run Accumulo and Rya? I plan on getting an Amazon EC2 instance so we > can collaborate easier and do some experiments - the budget limit is about > $1000 for a year. > > I thought that maybe a m3.large or m4.large instance would be good enough - > not sure which one would be better. > > Here is a link to the different instance types and pricing: > https://aws.amazon.com/ec2/instance-types/ > https://aws.amazon.com/ec2/pricing/ > > > > Thank you very much, > Adina > > -- > Dr. Adina Crainiceanu > Associate Professor, Computer Science Department > United States Naval Academy > 410-293-6822 > [email protected] > http://www.usna.edu/Users/cs/adina/ >
