Hey Rahul, My suggestion would be to the lower bar--do the absolute bare minimum to get the tests out there. For example, simply remove proprietary information and then get it on a public github (whether your personal github or a corporate one). From there, people can help by submitting pull requests to improve the infrastructure and harness. Making things easier is something that can be done over time. For example, we've had offers from a couple different Linux Admins to help on something. I'm sure that they could help with a number of the items you've identified. In the mean time, we risk patches being merged that have less than complete testing.
-- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli < [email protected]> wrote: > Jacques, > > I am breaking down steps 1,2 & 3 into sub-tasks so we can add/prioritize > these tasks > > Item #TaskSub-TaskCommentsPriority1*Publish the tests* > > > > > Remove Proprietary Data & Queries > 0 > > Redact Propriety Data/Queries > > > > Move tests into drill repo > This requires some refactoring to the framework code since the test > framework uses a 2-level directory structure > > > > Organize the tests using a label based approach > This involves code changes and moving a lot of files. When doing a one time > push it might be better to do this before publishing the tests? > > > Each suite should be independentSome suites wrongly assume that the data is > present. They should be identified and fixed > > > Cleanup hardcoded dependencies during data generationSome data-gen scripts > have hard-coded references > > > Cleanup downloadsThe same dataset is being downloaded multiple times by > different suites > > > Licenses for downloadsThe framework downloads some files automatically. > These files are publicly available. > However before downloading them users need to agree to certain terms. By > using the framework users might be skipping this step. We should look into > this > 2*Setup a cluster infrastructure to run the pre-commit tests* > > > 3*Local debugging of tests* > > > > > Add an optional maven target for running tests on a local machine > Tests can launch an embedded drillbit or they can connect to a running > drillbit through zookeeper > > > Running suites which require additional setup (hive, hbase etc) should be > made optional > > 4*Documentation* > > > > > Running Tests (options available and also listing the asumed defaults) > > > > Explaining how tests are organized > > > > Process for adding a new suite > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <[email protected]> > wrote: > > > Let's get number one done (tests out there so all community members can > run > > them). Then the whole community can work together to solve the rest. > > > > I don't think the base install should include integration test execution. > > I do think the tests should be in the main repo (as opposed to a > > secondary). > > > > We should strive to ultimately make running these integration tests a > > requirement for merging. We need to complete all the steps before we can > > impose that. I should be able to help on the global run component and > > supporting infrastructure. > > > > J > > > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli < > > [email protected]> wrote: > > > > > Ramana, > > > > > > You are right. We are trying to address multiple issues here, but not > > with > > > a single solution. I am summarizing them > > > > > > 1. Tests should be visible to everyone (Implicit goal) > > > 2. Before applying a patch we should run tests in a clustered > > environment. > > > Parth had a suggestion(#4) in his original email. > > > 3. Developers should be able to debug majority of the tests on their > > local > > > environment. I made a few suggestions above to this regard > > > > > > - Rahul > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <[email protected]> > wrote: > > > > > > > One important thing which we need to be clear on here is what are we > > > trying > > > > to address? > > > > > > > > I feel there are two separate issues here and I do not think one > > solution > > > > will fit both the issues. > > > > > > > > 1. Allowing developers to run tests on their local box so they > know > > > the > > > > changes they have are not completely wrong. > > > > 2. Allowing transparency in the integration tests process which is > > > > currently a black box. > > > > > > > > 1 is needed for developers to make changes and have an idea that > their > > > > changes are not going to fail tests en masse in the integration > suite. > > 2 > > > is > > > > needed because its a prerequisite for changes to be committed. > > > > > > > > > > > > Regards > > > > Ramana > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli < > > > > [email protected]> wrote: > > > > > > > > > Ramana, > > > > > > > > > > Let me fill in more details. > > > > > > > > > > 1. Before we accept a patch we want to make sure the tests run in a > > > > cluster > > > > > environment. No exceptions here. > > > > > 2. We want the contributors to be able to debug the failing tests > on > > > > their > > > > > laptops in as many cases as possbile. This requires : > > > > > 1. Tests should run on top of a local file system. (Tests > can > > > > > launch an embedded drillbit or they can connect to a running > drillbit > > > > > through zookeeper) > > > > > 2. Running suites which require additional setup (hive, > hbase > > > > etc) > > > > > should be made optional and sufficient documentation should be > > provided > > > > for > > > > > enabling and disabling these tests. > > > > > 3. In my opinion making these new tests part of drill would make it > > > > easier > > > > > for the developers to debug and run tests instead of having a > > different > > > > > repository. But as you said it might bloat the drill project > > > > > > > > > > - Rahul > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning < > [email protected]> > > > > > wrote: > > > > > > > > > > > The Hadoop family of projects has some software that integrates a > > > > > > continuous integration system so that every time a JIRA is marked > > as > > > > > > patch-available, the associated patch attached to the bug will > have > > > > > > integration tests run against it. I believe that there has been > > some > > > > > > process to use git hashes instead of patches. The CI results are > > put > > > > > back > > > > > > on the JIRA. > > > > > > > > > > > > This is done using a fairly simple set of scripts. Apache Yetus > is > > > > just > > > > > > forming as a direct-to-top-level spinoff from Hadoop > > > > > > > > > > > > Proposal is here (don't be fooled by the fact that it looks like > an > > > > > > incubation proposal): > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal > > > > > > > > > > > > Early code can be found here (don't guess that this is very real > > > yet). > > > > > > More links can be found in the proposal. > > > > > > > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs > > > > > > > > > > > > The project has not yet been formed and there are no mailing > lists > > or > > > > git > > > > > > repo yet. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <[email protected]> > > > > wrote: > > > > > > > > > > > > > As someone who worked on this for a while, including it as part > > of > > > > > drill > > > > > > > may bloat drill a bit too much. Also not a big fan of running > > > against > > > > > an > > > > > > > embedded drillbit. Does not replicate an actual production use > > > case. > > > > > > > > > > > > > > Additionally, setting up hive hbase and other components maybe > > > > painful > > > > > > and > > > > > > > unnecessary for most ppl. It would deter people from ever > > > > contributing > > > > > to > > > > > > > drill. We could spin up in memory hive and hbase but that's > > similar > > > > to > > > > > an > > > > > > > embedded drill bit. Does not replicate a production scenario. > > > > > > > > > > > > > > Would prefer the hive way with a central Jenkins server hosted > on > > > aws > > > > > and > > > > > > > accessible to everyone. Users should be able to submit a git > url > > > and > > > > > > that > > > > > > > should be able to deploy and fire off tests. Should then have a > > way > > > > to > > > > > > > easily communicate failures to contributors and if success > notify > > > the > > > > > > > commiters to commit the change. > > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can look into reuse > > > rather > > > > > than > > > > > > > doing it from scratch. Esp the Jenkins and configuration stuff. > > > > > > > > > > > > > > Regards > > > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <[email protected]> > > > wrote: > > > > > > > > > > > > > > > Drill devs use a set of tests that are not available as part > of > > > the > > > > > > > Apache > > > > > > > > distribution. These tests are a pre-requisite for all > commits, > > > but > > > > > are > > > > > > > not > > > > > > > > available to any contributors outside the current devs. > > > > > > > > > > > > > > > > This thread is to discuss various options to make these tests > > > > > > available. > > > > > > > > > > > > > > > > Assumptions and requirements - > > > > > > > > 1) A functional test (as opposed to a unit test) needs to be > > > closer > > > > > to > > > > > > > the > > > > > > > > end user environment than a development environment. As such, > > we > > > > > should > > > > > > > be > > > > > > > > running functional tests in a cluster environment, connect > > using > > > > > > > zookeeper > > > > > > > > etc. > > > > > > > > 2) Functional test will keep increasing in number, get more > > > complex > > > > > and > > > > > > > > take a longer and longer time to execute as we go along. > > > > > > > > 3) Some requirements are: > > > > > > > > a) We want to be strict in enforcing the pre-commit > > > > requirements, > > > > > > but > > > > > > > > not penalize the contributor who has a minor fix. > > > > > > > > b) All parts of the product (especially various > 'certified' > > > > > storage > > > > > > > > plugins like Hive and Hbase should get tested) > > > > > > > > c) It should be easy to debug issues when a test fails. > > Tests > > > > > > should > > > > > > > > fail deterministically. If a test fails, it should always > fail > > > and > > > > > > always > > > > > > > > fail in the same way (easier said than done). > > > > > > > > > > > > > > > > Some suggestions - > > > > > > > > 1) Tests should be a top-level maven module within the drill > > > > project > > > > > > > > a) We want the integration tests to run as part of > the > > > > > drill's > > > > > > > > maven build process > > > > > > > > b) The build step for the integration-tests module > > would > > > > > launch > > > > > > > an > > > > > > > > embedded drillbit and runs tests against it > > > > > > > > c) The tests will be a separate target so they need > not > > > be > > > > > run > > > > > > > all > > > > > > > > the time > > > > > > > > 2) Tests should be divided into multiple suites that are > based > > > on > > > > > > > > components. For example a test suite for testing datatypes > will > > > > > contain > > > > > > > the > > > > > > > > tests for various datatypes including complex types. A > > > contributor > > > > or > > > > > > > > developer can then run these tests more frequently as an > issue > > is > > > > > being > > > > > > > > addressed and run the entire suite only once before commit. > > > > > > > > 3) Provide the tests as a hosted service > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post > the > > > > > results > > > > > > to > > > > > > > > the JIRA (Hive does this). Or some variant of this idea. > > > > > > > > > > > > > > > > > > > > > > > > Some questions - > > > > > > > > 1) What do some other projects do? > > > > > > > > 2) Are there any technologies we can leverage that will make > > this > > > > > > easier? > > > > > > > > 3) How do we make it easier to debug failing tests. > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to question the assumptions and > requirements. > > Be > > > > > > > creative > > > > > > > > with your suggestions. > > > > > > > > > > > > > > > > Parth > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
