That is great news! From the Dremio side, I propose working with Steven. Let's start taking advantage of this awesome resource!
-- Jacques Nadeau CTO and Co-Founder, Dremio On Wed, Sep 23, 2015 at 5:34 PM, Edmon Begoli <[email protected]> wrote: > This request has been approved. I will get more details tomorrow. > > I could add to the resource few members of the Drill team, maybe one person > from MapR and one from Dremio > who can have access and can assist in configuring (or instructing resource > sysadmins) how to run the big tests, if desired. > They will need to apply and get RSA tokens. > > Then we can talk how to make this resource a part of the regular testing > and benchmarking process. > > Thank you, > Edmon > > On Fri, Sep 18, 2015 at 8:00 PM, Edmon Begoli <[email protected]> wrote: > > > I requested 5000 hours a year on Beacon for Apache Drill for high > > performance benchmarking, testing and optimization. > > I will let you know of the resolution pretty soon. I expect these > > resources to be awarded to the project. > > > > > > On Fri, Sep 18, 2015 at 6:22 PM, Parth Chandra <[email protected]> > > wrote: > > > >> +1 on running the build and tests. > >> If we need to run some kind of stress tests, we could consider running > >> TPC-H/TPC-DS at large scale factors. > >> > >> On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <[email protected]> > >> wrote: > >> > >> > Not offhand. It really depends on how the time would work. For > example, > >> it > >> > would be nice if we had an automated perfectly fressh (no .m2/repo) > >> nightly > >> > build and full test suite run so people can always check the status. > >> Maybe > >> > we use this hardware for that? > >> > > >> > -- > >> > Jacques Nadeau > >> > CTO and Co-Founder, Dremio > >> > > >> > On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli < > >> > [email protected]> wrote: > >> > > >> > > Edmon, > >> > > > >> > > We do have the tests available now [1]. > >> > > > >> > > Jacques, > >> > > > >> > > You expressed interest in making these tests available on an Amazon > >> > cluster > >> > > so that users need not have physical hardware required to run these > >> > tests. > >> > > Do you have any specific thoughts on how to leverage the resources > >> that > >> > > Edmon is willing to contribute (performance testing?) > >> > > > >> > > > >> > > [1] https://github.com/mapr/drill-test-framework > >> > > > >> > > - Rahul > >> > > > >> > > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <[email protected]> > >> wrote: > >> > > > >> > > > I discussed this idea of bringing large compute resource yesterday > >> with > >> > > my > >> > > > team at JICS to the project, and there was a general consensus > that > >> it > >> > > can > >> > > > be committed. > >> > > > > >> > > > I will request and hopefully commit pretty large set of > >> > > > clustered CPU/storage resources for the needs of a Drill project. > >> > > > > >> > > > I will be the PI for the resource, and could give access to > >> whomever we > >> > > > want to designate from the Drill project side. > >> > > > > >> > > > Just let me know. I should have project approved within few days. > >> > > > > >> > > > Edmon > >> > > > > >> > > > > >> > > > On Saturday, September 5, 2015, Edmon Begoli <[email protected]> > >> > wrote: > >> > > > > >> > > > > Ted, > >> > > > > > >> > > > > It is actually very easy and painless to do what I am > proposing. I > >> > > > > probably made it sound far more bureaucratic/legalistic than it > >> > really > >> > > > is. > >> > > > > > >> > > > > Researchers and projects from across the globe can apply for > >> cycles > >> > on > >> > > > > Beacon or any other HPC platform we run. (Beacon is by far the > >> best > >> > and > >> > > > we > >> > > > > already have a setup to run Spark and Hive on it. (We just > >> published > >> > > > paper > >> > > > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark > >> to > >> > run > >> > > > > JVM-bound jobs)) > >> > > > > > >> > > > > As for use of resources, at the end of year we need to submit > >> reports > >> > > for > >> > > > > all the projects that used compute resources and how. > >> > > > > It is part of our mission, as being one of the XSEDE centers, to > >> > > > > help promote the advancement of the science and technology. > >> > > > > Reports from Principal Investigators (PI) show how we did it. In > >> this > >> > > > > case, I can be a PI and have any/someone from the Drill team > >> assigned > >> > > > > access. > >> > > > > > >> > > > > I don't think there are any IP issues. Open source project, open > >> > > research > >> > > > > institution, use of resources for testing and benchmarking. We > >> could > >> > > > > actually make JICS a benchmarking site for Drill (and even other > >> > Apache > >> > > > > projects). > >> > > > > > >> > > > > We'll discuss other details in a hangout. I am also planning to > >> brief > >> > > my > >> > > > > team next Wednesday on the plan for the use of resources. > >> > > > > > >> > > > > Regards, > >> > > > > Edmon > >> > > > > > >> > > > > > >> > > > > On Saturday, September 5, 2015, Ted Dunning < > >> [email protected] > >> > > > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> > > > > > >> > > > >> Edmon, > >> > > > >> > >> > > > >> This is very interesting. I am sure that public > >> acknowledgements of > >> > > > >> contributions are easily managed. > >> > > > >> > >> > > > >> What might be even more useful for you would be small scale > >> > > > publications, > >> > > > >> especially about the problems of shoe-horning real-world data > >> > objects > >> > > > into > >> > > > >> the quasi-relational model of Drill. > >> > > > >> > >> > > > >> What would be problematic (and what is probably just a matter > of > >> > > > >> nomenclature) is naming of an institution by the Apache > specific > >> > term > >> > > > >> "committer" (you said commitment). Individuals at your > >> institution > >> > > would > >> > > > >> absolutely be up for being committers as they demonstrate a > track > >> > > record > >> > > > >> of > >> > > > >> contribution. > >> > > > >> > >> > > > >> I would expect no need for any paperwork between JICS and > Apache > >> > > unless > >> > > > >> you > >> > > > >> would like to execute a corporate contributor license to ensure > >> that > >> > > > >> particular individuals are specifically empowered to contribute > >> > code. > >> > > I > >> > > > >> don't know that the position of JICS is relative to > intellectual > >> > > > property, > >> > > > >> though, so it might be worth checking out institutional policy > on > >> > your > >> > > > >> side > >> > > > >> on how individuals can contribute to open source projects. It > >> > > shouldn't > >> > > > be > >> > > > >> too hard since there are quite a number of NSF funded people > who > >> do > >> > > > >> contribute. > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli < > [email protected]> > >> > > wrote: > >> > > > >> > >> > > > >> > I can work with my institution and the NSF that we committ > the > >> > time > >> > > on > >> > > > >> the > >> > > > >> > Beacon supercomputing cluster to Apache and the Drill > project. > >> > Maybe > >> > > > 20 > >> > > > >> > hours a month for 4-5 nodes. > >> > > > >> > > >> > > > >> > I have discretionary hours that I can put in, and I can, with > >> our > >> > > > >> > HPC admins, create deploy scripts on few clustered machines > >> (these > >> > > are > >> > > > >> all > >> > > > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, > >> and > >> > > > >> > with local 1 TB SSD each). There is also Medusa 10 PB > >> filesystem > >> > > > >> attached > >> > > > >> > but HDFS over local drives would probably be better. > >> > > > >> > They are otherwise just a regular machines, and run regular > >> JVMs > >> > on > >> > > > >> Linux. > >> > > > >> > > >> > > > >> > We can also get Rahul an access with a secure token to setup > >> > > > >> > and run stress/performance/integration tests for Drill. I can > >> > > actually > >> > > > >> help > >> > > > >> > there as well. This can be automated to run tests and collect > >> > > results. > >> > > > >> > > >> > > > >> > I think that the only requirement would be that the JICS team > >> be > >> > > named > >> > > > >> for > >> > > > >> > commitment because both NSF/XSEDE and UT like to see the > >> resources > >> > > > >> > being officially used and acknowledged. They are there to > >> support > >> > > open > >> > > > >> and > >> > > > >> > academic research; open source projects fit well. > >> > > > >> > > >> > > > >> > If this sounds OK with the project PMCs, I can start the > >> process > >> > of > >> > > > >> > allocation, accounts creation, setup. > >> > > > >> > > >> > > > >> > I would also, as a CDO, of JICS sign whatever standard papers > >> with > >> > > > >> > the Apache organization. > >> > > > >> > > >> > > > >> > With all this being said, let me know please if this is > >> something > >> > we > >> > > > >> want > >> > > > >> > to pursue. > >> > > > >> > > >> > > > >> > Thank you, > >> > > > >> > Edmon > >> > > > >> > > >> > > > >> > On Tuesday, September 1, 2015, Jacques Nadeau < > >> [email protected] > >> > > > >> > > > >> wrote: > >> > > > >> > > >> > > > >> > > I spent a bunch of time looking at the Phi coprocessors and > >> > forgot > >> > > > to > >> > > > >> get > >> > > > >> > > back to the thread. I'd love it if someone spent some time > >> > looking > >> > > > at > >> > > > >> > > leveraging them (since Drill is frequently processor > bound). > >> > Any > >> > > > >> takers? > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > -- > >> > > > >> > > Jacques Nadeau > >> > > > >> > > CTO and Co-Founder, Dremio > >> > > > >> > > > >> > > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra < > >> > > [email protected] > >> > > > >> > > <javascript:;>> wrote: > >> > > > >> > > > >> > > > >> > > > Hi Edmon, > >> > > > >> > > > Sorry no one seems to have got back to you on this. > >> > > > >> > > > We are in the process of publishing a test suite for > >> > > regression > >> > > > >> > testing > >> > > > >> > > > Drill and the cluster you have (even a few nodes ) would > >> be a > >> > > > great > >> > > > >> > > > resource for folks to run the test suite. Rahul, et al > are > >> > > working > >> > > > >> on > >> > > > >> > > this > >> > > > >> > > > and I would suggest watching out for Rahul's posts on the > >> > topic. > >> > > > >> > > > > >> > > > >> > > > Parth > >> > > > >> > > > > >> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli < > >> > > [email protected] > >> > > > >> > > <javascript:;>> wrote: > >> > > > >> > > > > >> > > > >> > > > > Hey folks, > >> > > > >> > > > > > >> > > > >> > > > > As we discussed today on a hangout, this is a machine > >> that > >> > we > >> > > > >> have at > >> > > > >> > > > > JICS/NICS > >> > > > >> > > > > where I have Drill installed and where I could set up a > >> test > >> > > > >> cluster > >> > > > >> > > over > >> > > > >> > > > > few nodes. > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > > >> > > > >> > >> > > > >> https://www.nics.tennessee.edu/computing-resources/beacon/configuration > >> > > > >> > > > > > >> > > > >> > > > > Note that each node is: > >> > > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors > >> > > > >> > > > > - 256 GB of memory > >> > > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of > >> memory > >> > > each > >> > > > >> > > > > - 960 GB of SSD storage > >> > > > >> > > > > > >> > > > >> > > > > Would someone advise on what would be an interesting > test > >> > > setup? > >> > > > >> > > > > > >> > > > >> > > > > Thank you, > >> > > > >> > > > > Edmon > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
