On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:
I think we can switch to desired framework (datastore+mapreduce)
gradually in the background:
Here is a short battle plan:
1. create datastore (google`s or similar)
2. design datastore layout (what to keep, how to keep, objects &
attributes)
3. create cmd line tool to submit results into datastore
4. integrate (3) into mtt
5. Milestone: we have tool to submit run results into two DBs
(currents & datastore)
Agreed -- this is very do-able.
6. Create mpi-aware cmd line tool to query submitted results. Tool
should allow query and fetch selected results.
7. Milestone: we have cmd line tool to query performance results.
This tool can be used by community to play with custom scripts for
fetching results and generating custom reports.
8. here we can collect 3rd party/contributed scripts to create
various visual reports based on perf results.
what do you think?
I think we can provide some dark forces here to perform most of the
steps.
Awesome! I can say that if this stuff becomes available, Cisco will
start "double submitting" -- do the currently-official postgres db
(i.e., same as today), and to the new/experimental datastore.
Will it be possible to host datastore on openmpi.org and open access
to it?
I think we have 2 options here:
1. Google's datastore/app engine. That requires signing up for a
Google Apps account with Google Engine access. Josh has one of these
(anyone can get a Google Apps account; as I understand it, you have to
apply for Google Engine access and approval can take a looooong time
-- Josh just got approved after nearly a year). Josh -- could we use
your account, perchance? (I'm not sure if this is Josh's main/
personal Google account, or a generic account he created)
2. Hadoop. This is the open source project that is modeled off
Google's papers that they published about map/reduce. We'd have to
host the hadoop data store somewhere (e.g., IU), but it benefits from
having multiple machines to store data, such as a data farm. I do not
believe that IU has such a resource.
There are definite similarities between the two choices, but I believe
the APIs are different -- so we have to code for one or the other.
I think I would prefer #1 in order to take care of the hosting issue.
If we get past the proof-of-concept stage, I'm guessing it'll be
pretty easy to get the funding to get a real Google Apps account (it's
$50/user/year -- darn cheap).
--
Jeff Squyres
Cisco Systems