+1. Get it out there.
On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <[email protected]> wrote: > Hey Rahul, > > My suggestion would be to the lower bar--do the absolute bare minimum to > get the tests out there. For example, simply remove proprietary > information and then get it on a public github (whether your personal > github or a corporate one). From there, people can help by submitting pull > requests to improve the infrastructure and harness. Making things easier > is something that can be done over time. For example, we've had offers > from a couple different Linux Admins to help on something. I'm sure that > they could help with a number of the items you've identified. In the mean > time, we risk patches being merged that have less than complete testing. > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli < > [email protected]> wrote: > > > Jacques, > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can add/prioritize > > these tasks > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests* > > > > > > > > > > Remove Proprietary Data & Queries > > 0 > > > > Redact Propriety Data/Queries > > > > > > > > Move tests into drill repo > > This requires some refactoring to the framework code since the test > > framework uses a 2-level directory structure > > > > > > > > Organize the tests using a label based approach > > This involves code changes and moving a lot of files. When doing a one > time > > push it might be better to do this before publishing the tests? > > > > > > Each suite should be independentSome suites wrongly assume that the data > is > > present. They should be identified and fixed > > > > > > Cleanup hardcoded dependencies during data generationSome data-gen > scripts > > have hard-coded references > > > > > > Cleanup downloadsThe same dataset is being downloaded multiple times by > > different suites > > > > > > Licenses for downloadsThe framework downloads some files automatically. > > These files are publicly available. > > However before downloading them users need to agree to certain terms. By > > using the framework users might be skipping this step. We should look > into > > this > > 2*Setup a cluster infrastructure to run the pre-commit tests* > > > > > > 3*Local debugging of tests* > > > > > > > > > > Add an optional maven target for running tests on a local machine > > Tests can launch an embedded drillbit or they can connect to a running > > drillbit through zookeeper > > > > > > Running suites which require additional setup (hive, hbase etc) should be > > made optional > > > > 4*Documentation* > > > > > > > > > > Running Tests (options available and also listing the asumed defaults) > > > > > > > > Explaining how tests are organized > > > > > > > > Process for adding a new suite > > > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > Let's get number one done (tests out there so all community members can > > run > > > them). Then the whole community can work together to solve the rest. > > > > > > I don't think the base install should include integration test > execution. > > > I do think the tests should be in the main repo (as opposed to a > > > secondary). > > > > > > We should strive to ultimately make running these integration tests a > > > requirement for merging. We need to complete all the steps before we > can > > > impose that. I should be able to help on the global run component and > > > supporting infrastructure. > > > > > > J > > > > > > > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli < > > > [email protected]> wrote: > > > > > > > Ramana, > > > > > > > > You are right. We are trying to address multiple issues here, but not > > > with > > > > a single solution. I am summarizing them > > > > > > > > 1. Tests should be visible to everyone (Implicit goal) > > > > 2. Before applying a patch we should run tests in a clustered > > > environment. > > > > Parth had a suggestion(#4) in his original email. > > > > 3. Developers should be able to debug majority of the tests on their > > > local > > > > environment. I made a few suggestions above to this regard > > > > > > > > - Rahul > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <[email protected]> > > wrote: > > > > > > > > > One important thing which we need to be clear on here is what are > we > > > > trying > > > > > to address? > > > > > > > > > > I feel there are two separate issues here and I do not think one > > > solution > > > > > will fit both the issues. > > > > > > > > > > 1. Allowing developers to run tests on their local box so they > > know > > > > the > > > > > changes they have are not completely wrong. > > > > > 2. Allowing transparency in the integration tests process which > is > > > > > currently a black box. > > > > > > > > > > 1 is needed for developers to make changes and have an idea that > > their > > > > > changes are not going to fail tests en masse in the integration > > suite. > > > 2 > > > > is > > > > > needed because its a prerequisite for changes to be committed. > > > > > > > > > > > > > > > Regards > > > > > Ramana > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli < > > > > > [email protected]> wrote: > > > > > > > > > > > Ramana, > > > > > > > > > > > > Let me fill in more details. > > > > > > > > > > > > 1. Before we accept a patch we want to make sure the tests run > in a > > > > > cluster > > > > > > environment. No exceptions here. > > > > > > 2. We want the contributors to be able to debug the failing > tests > > on > > > > > their > > > > > > laptops in as many cases as possbile. This requires : > > > > > > 1. Tests should run on top of a local file system. (Tests > > can > > > > > > launch an embedded drillbit or they can connect to a running > > drillbit > > > > > > through zookeeper) > > > > > > 2. Running suites which require additional setup (hive, > > hbase > > > > > etc) > > > > > > should be made optional and sufficient documentation should be > > > provided > > > > > for > > > > > > enabling and disabling these tests. > > > > > > 3. In my opinion making these new tests part of drill would make > it > > > > > easier > > > > > > for the developers to debug and run tests instead of having a > > > different > > > > > > repository. But as you said it might bloat the drill project > > > > > > > > > > > > - Rahul > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > The Hadoop family of projects has some software that > integrates a > > > > > > > continuous integration system so that every time a JIRA is > marked > > > as > > > > > > > patch-available, the associated patch attached to the bug will > > have > > > > > > > integration tests run against it. I believe that there has > been > > > some > > > > > > > process to use git hashes instead of patches. The CI results > are > > > put > > > > > > back > > > > > > > on the JIRA. > > > > > > > > > > > > > > This is done using a fairly simple set of scripts. Apache > Yetus > > is > > > > > just > > > > > > > forming as a direct-to-top-level spinoff from Hadoop > > > > > > > > > > > > > > Proposal is here (don't be fooled by the fact that it looks > like > > an > > > > > > > incubation proposal): > > > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal > > > > > > > > > > > > > > Early code can be found here (don't guess that this is very > real > > > > yet). > > > > > > > More links can be found in the proposal. > > > > > > > > > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs > > > > > > > > > > > > > > The project has not yet been formed and there are no mailing > > lists > > > or > > > > > git > > > > > > > repo yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > As someone who worked on this for a while, including it as > part > > > of > > > > > > drill > > > > > > > > may bloat drill a bit too much. Also not a big fan of running > > > > against > > > > > > an > > > > > > > > embedded drillbit. Does not replicate an actual production > use > > > > case. > > > > > > > > > > > > > > > > Additionally, setting up hive hbase and other components > maybe > > > > > painful > > > > > > > and > > > > > > > > unnecessary for most ppl. It would deter people from ever > > > > > contributing > > > > > > to > > > > > > > > drill. We could spin up in memory hive and hbase but that's > > > similar > > > > > to > > > > > > an > > > > > > > > embedded drill bit. Does not replicate a production scenario. > > > > > > > > > > > > > > > > Would prefer the hive way with a central Jenkins server > hosted > > on > > > > aws > > > > > > and > > > > > > > > accessible to everyone. Users should be able to submit a git > > url > > > > and > > > > > > > that > > > > > > > > should be able to deploy and fire off tests. Should then > have a > > > way > > > > > to > > > > > > > > easily communicate failures to contributors and if success > > notify > > > > the > > > > > > > > commiters to commit the change. > > > > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can look into reuse > > > > rather > > > > > > than > > > > > > > > doing it from scratch. Esp the Jenkins and configuration > stuff. > > > > > > > > > > > > > > > > Regards > > > > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > Drill devs use a set of tests that are not available as > part > > of > > > > the > > > > > > > > Apache > > > > > > > > > distribution. These tests are a pre-requisite for all > > commits, > > > > but > > > > > > are > > > > > > > > not > > > > > > > > > available to any contributors outside the current devs. > > > > > > > > > > > > > > > > > > This thread is to discuss various options to make these > tests > > > > > > > available. > > > > > > > > > > > > > > > > > > Assumptions and requirements - > > > > > > > > > 1) A functional test (as opposed to a unit test) needs to > be > > > > closer > > > > > > to > > > > > > > > the > > > > > > > > > end user environment than a development environment. As > such, > > > we > > > > > > should > > > > > > > > be > > > > > > > > > running functional tests in a cluster environment, connect > > > using > > > > > > > > zookeeper > > > > > > > > > etc. > > > > > > > > > 2) Functional test will keep increasing in number, get more > > > > complex > > > > > > and > > > > > > > > > take a longer and longer time to execute as we go along. > > > > > > > > > 3) Some requirements are: > > > > > > > > > a) We want to be strict in enforcing the pre-commit > > > > > requirements, > > > > > > > but > > > > > > > > > not penalize the contributor who has a minor fix. > > > > > > > > > b) All parts of the product (especially various > > 'certified' > > > > > > storage > > > > > > > > > plugins like Hive and Hbase should get tested) > > > > > > > > > c) It should be easy to debug issues when a test fails. > > > Tests > > > > > > > should > > > > > > > > > fail deterministically. If a test fails, it should always > > fail > > > > and > > > > > > > always > > > > > > > > > fail in the same way (easier said than done). > > > > > > > > > > > > > > > > > > Some suggestions - > > > > > > > > > 1) Tests should be a top-level maven module within the > drill > > > > > project > > > > > > > > > a) We want the integration tests to run as part of > > the > > > > > > drill's > > > > > > > > > maven build process > > > > > > > > > b) The build step for the integration-tests module > > > would > > > > > > launch > > > > > > > > an > > > > > > > > > embedded drillbit and runs tests against it > > > > > > > > > c) The tests will be a separate target so they need > > not > > > > be > > > > > > run > > > > > > > > all > > > > > > > > > the time > > > > > > > > > 2) Tests should be divided into multiple suites that are > > based > > > > on > > > > > > > > > components. For example a test suite for testing datatypes > > will > > > > > > contain > > > > > > > > the > > > > > > > > > tests for various datatypes including complex types. A > > > > contributor > > > > > or > > > > > > > > > developer can then run these tests more frequently as an > > issue > > > is > > > > > > being > > > > > > > > > addressed and run the entire suite only once before commit. > > > > > > > > > 3) Provide the tests as a hosted service > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post > > the > > > > > > results > > > > > > > to > > > > > > > > > the JIRA (Hive does this). Or some variant of this idea. > > > > > > > > > > > > > > > > > > > > > > > > > > > Some questions - > > > > > > > > > 1) What do some other projects do? > > > > > > > > > 2) Are there any technologies we can leverage that will > make > > > this > > > > > > > easier? > > > > > > > > > 3) How do we make it easier to debug failing tests. > > > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to question the assumptions and > > requirements. > > > Be > > > > > > > > creative > > > > > > > > > with your suggestions. > > > > > > > > > > > > > > > > > > Parth > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
