Re: [MTT devel] GSOC application

Josh Hursey Fri, 20 Mar 2009 10:42:38 -0400

Yeah I think this sounds like a good way to move forward with thiswork. The database schema is pretty complex. If you need help on thedatabase side of things let me know.

To get started, would it be useful to have a meeting over the phone/telepresence to design the datastore layout? This gives us anopportunity to start from a blank slate with regards to thedatastore, so it may be useful brainstorm a bit beforehand.

The Google Apps account is under my personal Google account, so I'mreluctant to use it. I think the reason it took so long for me, wasbecause when I originally signed up it was in limited beta. I thinkthe approval time is much shorter now (maybe a day?), and we can makean openmpi or mtt account that we can use.

With regard to Hadoop, I don't think that IU has a set of machinesthat would work, but I can ask around. We could always try Hadoop ona single machine if people wanted to play around with data querying/storage.

I don't have a strong preference either way, but Google Apps mayprovide us with a lower overhead solution for the long run eventhough it costs $$.


Cheers,
Josh

On Mar 19, 2009, at 11:06 AM, Jeff Squyres wrote:

On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:
I think we can switch to desired framework (datastore+mapreduce)gradually in the background:
Here is a short battle plan:

1. create datastore (google`s or similar)
2. design datastore layout (what to keep, how to keep, objects &attributes)
3. create cmd line tool to submit results into datastore
4. integrate (3) into mtt
5. Milestone: we have tool to submit run results into two DBs(currents & datastore)
Agreed -- this is very do-able.
6. Create mpi-aware cmd line tool to query submitted results. Toolshould allow query and fetch selected results.7. Milestone: we have cmd line tool to query performance results.This tool can be used by community to play with custom scripts forfetching results and generating custom reports.
8. here we can collect 3rd party/contributed scripts to createvarious visual reports based on perf results.
what do you think?
I think we can provide some dark forces here to perform most ofthe steps.
Awesome! I can say that if this stuff becomes available, Ciscowill start "double submitting" -- do the currently-officialpostgres db (i.e., same as today), and to the new/experimentaldatastore.
Will it be possible to host datastore on openmpi.org and openaccess to it?
I think we have 2 options here:
1. Google's datastore/app engine. That requires signing up for aGoogle Apps account with Google Engine access. Josh has one ofthese (anyone can get a Google Apps account; as I understand it,you have to apply for Google Engine access and approval can take alooooong time -- Josh just got approved after nearly a year). Josh-- could we use your account, perchance? (I'm not sure if this isJosh's main/personal Google account, or a generic account he created)
2. Hadoop. This is the open source project that is modeled offGoogle's papers that they published about map/reduce. We'd have tohost the hadoop data store somewhere (e.g., IU), but it benefitsfrom having multiple machines to store data, such as a data farm.I do not believe that IU has such a resource.
There are definite similarities between the two choices, but Ibelieve the APIs are different -- so we have to code for one or theother.
I think I would prefer #1 in order to take care of the hostingissue. If we get past the proof-of-concept stage, I'm guessingit'll be pretty easy to get the funding to get a real Google Appsaccount (it's $50/user/year -- darn cheap).
--
Jeff Squyres
Cisco Systems

_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Re: [MTT devel] GSOC application

Reply via email to