I discussed this idea of bringing large compute resource yesterday with my team at JICS to the project, and there was a general consensus that it can be committed.
I will request and hopefully commit pretty large set of clustered CPU/storage resources for the needs of a Drill project. I will be the PI for the resource, and could give access to whomever we want to designate from the Drill project side. Just let me know. I should have project approved within few days. Edmon On Saturday, September 5, 2015, Edmon Begoli <[email protected]> wrote: > Ted, > > It is actually very easy and painless to do what I am proposing. I > probably made it sound far more bureaucratic/legalistic than it really is. > > Researchers and projects from across the globe can apply for cycles on > Beacon or any other HPC platform we run. (Beacon is by far the best and we > already have a setup to run Spark and Hive on it. (We just published paper > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run > JVM-bound jobs)) > > As for use of resources, at the end of year we need to submit reports for > all the projects that used compute resources and how. > It is part of our mission, as being one of the XSEDE centers, to > help promote the advancement of the science and technology. > Reports from Principal Investigators (PI) show how we did it. In this > case, I can be a PI and have any/someone from the Drill team assigned > access. > > I don't think there are any IP issues. Open source project, open research > institution, use of resources for testing and benchmarking. We could > actually make JICS a benchmarking site for Drill (and even other Apache > projects). > > We'll discuss other details in a hangout. I am also planning to brief my > team next Wednesday on the plan for the use of resources. > > Regards, > Edmon > > > On Saturday, September 5, 2015, Ted Dunning <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Edmon, >> >> This is very interesting. I am sure that public acknowledgements of >> contributions are easily managed. >> >> What might be even more useful for you would be small scale publications, >> especially about the problems of shoe-horning real-world data objects into >> the quasi-relational model of Drill. >> >> What would be problematic (and what is probably just a matter of >> nomenclature) is naming of an institution by the Apache specific term >> "committer" (you said commitment). Individuals at your institution would >> absolutely be up for being committers as they demonstrate a track record >> of >> contribution. >> >> I would expect no need for any paperwork between JICS and Apache unless >> you >> would like to execute a corporate contributor license to ensure that >> particular individuals are specifically empowered to contribute code. I >> don't know that the position of JICS is relative to intellectual property, >> though, so it might be worth checking out institutional policy on your >> side >> on how individuals can contribute to open source projects. It shouldn't be >> too hard since there are quite a number of NSF funded people who do >> contribute. >> >> >> >> >> >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <[email protected]> wrote: >> >> > I can work with my institution and the NSF that we committ the time on >> the >> > Beacon supercomputing cluster to Apache and the Drill project. Maybe 20 >> > hours a month for 4-5 nodes. >> > >> > I have discretionary hours that I can put in, and I can, with our >> > HPC admins, create deploy scripts on few clustered machines (these are >> all >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem >> attached >> > but HDFS over local drives would probably be better. >> > They are otherwise just a regular machines, and run regular JVMs on >> Linux. >> > >> > We can also get Rahul an access with a secure token to setup >> > and run stress/performance/integration tests for Drill. I can actually >> help >> > there as well. This can be automated to run tests and collect results. >> > >> > I think that the only requirement would be that the JICS team be named >> for >> > commitment because both NSF/XSEDE and UT like to see the resources >> > being officially used and acknowledged. They are there to support open >> and >> > academic research; open source projects fit well. >> > >> > If this sounds OK with the project PMCs, I can start the process of >> > allocation, accounts creation, setup. >> > >> > I would also, as a CDO, of JICS sign whatever standard papers with >> > the Apache organization. >> > >> > With all this being said, let me know please if this is something we >> want >> > to pursue. >> > >> > Thank you, >> > Edmon >> > >> > On Tuesday, September 1, 2015, Jacques Nadeau <[email protected]> >> wrote: >> > >> > > I spent a bunch of time looking at the Phi coprocessors and forgot to >> get >> > > back to the thread. I'd love it if someone spent some time looking at >> > > leveraging them (since Drill is frequently processor bound). Any >> takers? >> > > >> > > >> > > >> > > -- >> > > Jacques Nadeau >> > > CTO and Co-Founder, Dremio >> > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <[email protected] >> > > <javascript:;>> wrote: >> > > >> > > > Hi Edmon, >> > > > Sorry no one seems to have got back to you on this. >> > > > We are in the process of publishing a test suite for regression >> > testing >> > > > Drill and the cluster you have (even a few nodes ) would be a great >> > > > resource for folks to run the test suite. Rahul, et al are working >> on >> > > this >> > > > and I would suggest watching out for Rahul's posts on the topic. >> > > > >> > > > Parth >> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <[email protected] >> > > <javascript:;>> wrote: >> > > > >> > > > > Hey folks, >> > > > > >> > > > > As we discussed today on a hangout, this is a machine that we >> have at >> > > > > JICS/NICS >> > > > > where I have Drill installed and where I could set up a test >> cluster >> > > over >> > > > > few nodes. >> > > > > >> > > > > >> > > >> https://www.nics.tennessee.edu/computing-resources/beacon/configuration >> > > > > >> > > > > Note that each node is: >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors >> > > > > - 256 GB of memory >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each >> > > > > - 960 GB of SSD storage >> > > > > >> > > > > Would someone advise on what would be an interesting test setup? >> > > > > >> > > > > Thank you, >> > > > > Edmon >> > > > > >> > > > >> > > >> > >> >
