Yeah I think this sounds like a good way to move forward with this work. The database schema is pretty complex. If you need help on the database side of things let me know.

To get started, would it be useful to have a meeting over the phone/ telepresence to design the datastore layout? This gives us an opportunity to start from a blank slate with regards to the datastore, so it may be useful brainstorm a bit beforehand.

The Google Apps account is under my personal Google account, so I'm reluctant to use it. I think the reason it took so long for me, was because when I originally signed up it was in limited beta. I think the approval time is much shorter now (maybe a day?), and we can make an openmpi or mtt account that we can use.

With regard to Hadoop, I don't think that IU has a set of machines that would work, but I can ask around. We could always try Hadoop on a single machine if people wanted to play around with data querying/ storage.

I don't have a strong preference either way, but Google Apps may provide us with a lower overhead solution for the long run even though it costs $$.

Cheers,
Josh

On Mar 19, 2009, at 11:06 AM, Jeff Squyres wrote:

On Mar 19, 2009, at 10:51 AM, Mike Dubman wrote:

I think we can switch to desired framework (datastore+mapreduce) gradually in the background:
Here is a short battle plan:

1. create datastore (google`s or similar)
2. design datastore layout (what to keep, how to keep, objects & attributes)
3. create cmd line tool to submit results into datastore
4. integrate (3) into mtt
5. Milestone: we have tool to submit run results into two DBs (currents & datastore)

Agreed -- this is very do-able.

6. Create mpi-aware cmd line tool to query submitted results. Tool should allow query and fetch selected results. 7. Milestone: we have cmd line tool to query performance results. This tool can be used by community to play with custom scripts for fetching results and generating custom reports.

8. here we can collect 3rd party/contributed scripts to create various visual reports based on perf results.

what do you think?

I think we can provide some dark forces here to perform most of the steps.

Awesome! I can say that if this stuff becomes available, Cisco will start "double submitting" -- do the currently-official postgres db (i.e., same as today), and to the new/experimental datastore.

Will it be possible to host datastore on openmpi.org and open access to it?


I think we have 2 options here:

1. Google's datastore/app engine. That requires signing up for a Google Apps account with Google Engine access. Josh has one of these (anyone can get a Google Apps account; as I understand it, you have to apply for Google Engine access and approval can take a looooong time -- Josh just got approved after nearly a year). Josh -- could we use your account, perchance? (I'm not sure if this is Josh's main/personal Google account, or a generic account he created)

2. Hadoop. This is the open source project that is modeled off Google's papers that they published about map/reduce. We'd have to host the hadoop data store somewhere (e.g., IU), but it benefits from having multiple machines to store data, such as a data farm. I do not believe that IU has such a resource.

There are definite similarities between the two choices, but I believe the APIs are different -- so we have to code for one or the other.

I think I would prefer #1 in order to take care of the hosting issue. If we get past the proof-of-concept stage, I'm guessing it'll be pretty easy to get the funding to get a real Google Apps account (it's $50/user/year -- darn cheap).

--
Jeff Squyres
Cisco Systems

_______________________________________________
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to