So what is the status on this? It would be nice to have this out with 1.2 coming out.
Regards Ramana On Wed, Aug 5, 2015 at 11:08 AM, Abhishek Girish <[email protected]> wrote: > Ramana, > > I think the issue with licenses is mostly resolved. It was discussed that > for TPC-*, since we shall not be redistributing the data-gen software, but > distributing a randomized variant of the data generated by it, we should be > okay to include it part of our framework. For other datasets, we shall > either provide their copy of license with our framework, or simply provide > a link for users to download data before they execute. > > For now we should focus on having the framework out with minimal cleanup. > In near future we can work on setting up infrastructure and enhancing the > framework itself. > > -Abhishek > > On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > @Jacques, Ted > > > > in the mean time, we risk patches being merged that have less than > complete > > > testing. > > > > > > While I agree with the premise of getting the tests out as soon as > possible > > it does not help us achieve anything except transparency. Your statement > > that getting the tests out will increase quality is dependent on someone > > actually being able to run the tests once they have access to it. > > > > Maybe we should focus on making a jenkins job to run the tests publicly. > > With that in place we can exclude the TPC* datasets as well as the yelp > > data sets from the framework and avoid licensing issues. > > > > Regards > > Ramana > > > > > > On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish < > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > > wrote: > > > > > We not only re-distribute external data-sets as-is, but also include > > > variants for those (text -> parquet, json, ...). So the challenge here > is > > > not simply disabling automatic downloads via the framework, and point > > users > > > to manually download the files before running the framework, but also > > about > > > how we will handle tests which require variants of the data sets. It > > simply > > > isn't practical to users of the framework to (1) download data-gen > > manually > > > (2) use specific seed / options before generating data, (3) convert > them > > to > > > parquet, etc.. (4) move them to specific locations inside their copy of > > the > > > framework. > > > > > > Something we'll need to know is how other projects are handling > > bench-mark > > > & other external datasets. > > > > > > -Abhishek > > > > > > On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli < > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > > > Thanks for your inputs. > > > > > > > > Once issue with just publishing the tests in their current state is > > that, > > > > the framework re-distributes tpch, tpcds, yelp data sets without > > > requiring > > > > the users to accept their relevant licenses. A good number of tests > > uses > > > > these data sets. Any thoughts on how to handle this? > > > > > > > > - Rahul > > > > > > > > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <[email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > > > > wrote: > > > > > > > > > +1. Get it out there. > > > > > > > > > > > > > > > > > > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau < > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > > > > > wrote: > > > > > > > > > > > Hey Rahul, > > > > > > > > > > > > My suggestion would be to the lower bar--do the absolute bare > > minimum > > > > to > > > > > > get the tests out there. For example, simply remove proprietary > > > > > > information and then get it on a public github (whether your > > personal > > > > > > github or a corporate one). From there, people can help by > > > submitting > > > > > pull > > > > > > requests to improve the infrastructure and harness. Making > things > > > > easier > > > > > > is something that can be done over time. For example, we've had > > > offers > > > > > > from a couple different Linux Admins to help on something. I'm > > sure > > > > that > > > > > > they could help with a number of the items you've identified. In > > the > > > > > mean > > > > > > time, we risk patches being merged that have less than complete > > > > testing. > > > > > > > > > > > > > > > > > > -- > > > > > > Jacques Nadeau > > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli < > > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > > > > > > > > > Jacques, > > > > > > > > > > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can > > > > > add/prioritize > > > > > > > these tasks > > > > > > > > > > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Remove Proprietary Data & Queries > > > > > > > 0 > > > > > > > > > > > > > > Redact Propriety Data/Queries > > > > > > > > > > > > > > > > > > > > > > > > > > > > Move tests into drill repo > > > > > > > This requires some refactoring to the framework code since the > > test > > > > > > > framework uses a 2-level directory structure > > > > > > > > > > > > > > > > > > > > > > > > > > > > Organize the tests using a label based approach > > > > > > > This involves code changes and moving a lot of files. When > doing > > a > > > > one > > > > > > time > > > > > > > push it might be better to do this before publishing the tests? > > > > > > > > > > > > > > > > > > > > > Each suite should be independentSome suites wrongly assume that > > the > > > > > data > > > > > > is > > > > > > > present. They should be identified and fixed > > > > > > > > > > > > > > > > > > > > > Cleanup hardcoded dependencies during data generationSome > > data-gen > > > > > > scripts > > > > > > > have hard-coded references > > > > > > > > > > > > > > > > > > > > > Cleanup downloadsThe same dataset is being downloaded multiple > > > times > > > > by > > > > > > > different suites > > > > > > > > > > > > > > > > > > > > > Licenses for downloadsThe framework downloads some files > > > > automatically. > > > > > > > These files are publicly available. > > > > > > > However before downloading them users need to agree to certain > > > terms. > > > > > By > > > > > > > using the framework users might be skipping this step. We > should > > > look > > > > > > into > > > > > > > this > > > > > > > 2*Setup a cluster infrastructure to run the pre-commit tests* > > > > > > > > > > > > > > > > > > > > > 3*Local debugging of tests* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Add an optional maven target for running tests on a local > machine > > > > > > > Tests can launch an embedded drillbit or they can connect to a > > > > running > > > > > > > drillbit through zookeeper > > > > > > > > > > > > > > > > > > > > > Running suites which require additional setup (hive, hbase etc) > > > > should > > > > > be > > > > > > > made optional > > > > > > > > > > > > > > 4*Documentation* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Running Tests (options available and also listing the asumed > > > > defaults) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Explaining how tests are organized > > > > > > > > > > > > > > > > > > > > > > > > > > > > Process for adding a new suite > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau < > > > [email protected] <javascript:_e(%7B%7D,'cvml','[email protected] > ');>> > > > > > > > wrote: > > > > > > > > > > > > > > > Let's get number one done (tests out there so all community > > > members > > > > > can > > > > > > > run > > > > > > > > them). Then the whole community can work together to solve > the > > > > rest. > > > > > > > > > > > > > > > > I don't think the base install should include integration > test > > > > > > execution. > > > > > > > > I do think the tests should be in the main repo (as opposed > to > > a > > > > > > > > secondary). > > > > > > > > > > > > > > > > We should strive to ultimately make running these integration > > > > tests a > > > > > > > > requirement for merging. We need to complete all the steps > > > before > > > > we > > > > > > can > > > > > > > > impose that. I should be able to help on the global run > > > component > > > > > and > > > > > > > > supporting infrastructure. > > > > > > > > > > > > > > > > J > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jacques Nadeau > > > > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli < > > > > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > > > > > > > > > > > > > Ramana, > > > > > > > > > > > > > > > > > > You are right. We are trying to address multiple issues > here, > > > but > > > > > not > > > > > > > > with > > > > > > > > > a single solution. I am summarizing them > > > > > > > > > > > > > > > > > > 1. Tests should be visible to everyone (Implicit goal) > > > > > > > > > 2. Before applying a patch we should run tests in a > clustered > > > > > > > > environment. > > > > > > > > > Parth had a suggestion(#4) in his original email. > > > > > > > > > 3. Developers should be able to debug majority of the tests > > on > > > > > their > > > > > > > > local > > > > > > > > > environment. I made a few suggestions above to this regard > > > > > > > > > > > > > > > > > > - Rahul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N < > > > [email protected] <javascript:_e(%7B%7D,'cvml','[email protected] > ');> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > One important thing which we need to be clear on here is > > what > > > > are > > > > > > we > > > > > > > > > trying > > > > > > > > > > to address? > > > > > > > > > > > > > > > > > > > > I feel there are two separate issues here and I do not > > think > > > > one > > > > > > > > solution > > > > > > > > > > will fit both the issues. > > > > > > > > > > > > > > > > > > > > 1. Allowing developers to run tests on their local box > > so > > > > they > > > > > > > know > > > > > > > > > the > > > > > > > > > > changes they have are not completely wrong. > > > > > > > > > > 2. Allowing transparency in the integration tests > > process > > > > > which > > > > > > is > > > > > > > > > > currently a black box. > > > > > > > > > > > > > > > > > > > > 1 is needed for developers to make changes and have an > idea > > > > that > > > > > > > their > > > > > > > > > > changes are not going to fail tests en masse in the > > > integration > > > > > > > suite. > > > > > > > > 2 > > > > > > > > > is > > > > > > > > > > needed because its a prerequisite for changes to be > > > committed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli < > > > > > > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > > > > > > > > > > > > > > > > > Ramana, > > > > > > > > > > > > > > > > > > > > > > Let me fill in more details. > > > > > > > > > > > > > > > > > > > > > > 1. Before we accept a patch we want to make sure the > > tests > > > > run > > > > > > in a > > > > > > > > > > cluster > > > > > > > > > > > environment. No exceptions here. > > > > > > > > > > > 2. We want the contributors to be able to debug the > > > failing > > > > > > tests > > > > > > > on > > > > > > > > > > their > > > > > > > > > > > laptops in as many cases as possbile. This requires : > > > > > > > > > > > 1. Tests should run on top of a local file > > system. > > > > > (Tests > > > > > > > can > > > > > > > > > > > launch an embedded drillbit or they can connect to a > > > running > > > > > > > drillbit > > > > > > > > > > > through zookeeper) > > > > > > > > > > > 2. Running suites which require additional > setup > > > > (hive, > > > > > > > hbase > > > > > > > > > > etc) > > > > > > > > > > > should be made optional and sufficient documentation > > should > > > > be > > > > > > > > provided > > > > > > > > > > for > > > > > > > > > > > enabling and disabling these tests. > > > > > > > > > > > 3. In my opinion making these new tests part of drill > > would > > > > > make > > > > > > it > > > > > > > > > > easier > > > > > > > > > > > for the developers to debug and run tests instead of > > > having a > > > > > > > > different > > > > > > > > > > > repository. But as you said it might bloat the drill > > > project > > > > > > > > > > > > > > > > > > > > > > - Rahul > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning < > > > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > The Hadoop family of projects has some software that > > > > > > integrates a > > > > > > > > > > > > continuous integration system so that every time a > JIRA > > > is > > > > > > marked > > > > > > > > as > > > > > > > > > > > > patch-available, the associated patch attached to the > > bug > > > > > will > > > > > > > have > > > > > > > > > > > > integration tests run against it. I believe that > there > > > has > > > > > > been > > > > > > > > some > > > > > > > > > > > > process to use git hashes instead of patches. The CI > > > > results > > > > > > are > > > > > > > > put > > > > > > > > > > > back > > > > > > > > > > > > on the JIRA. > > > > > > > > > > > > > > > > > > > > > > > > This is done using a fairly simple set of scripts. > > > Apache > > > > > > Yetus > > > > > > > is > > > > > > > > > > just > > > > > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop > > > > > > > > > > > > > > > > > > > > > > > > Proposal is here (don't be fooled by the fact that it > > > looks > > > > > > like > > > > > > > an > > > > > > > > > > > > incubation proposal): > > > > > > > > > > > > > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal > > > > > > > > > > > > > > > > > > > > > > > > Early code can be found here (don't guess that this > is > > > very > > > > > > real > > > > > > > > > yet). > > > > > > > > > > > > More links can be found in the proposal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs > > > > > > > > > > > > > > > > > > > > > > > > The project has not yet been formed and there are no > > > > mailing > > > > > > > lists > > > > > > > > or > > > > > > > > > > git > > > > > > > > > > > > repo yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N < > > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > As someone who worked on this for a while, > including > > it > > > > as > > > > > > part > > > > > > > > of > > > > > > > > > > > drill > > > > > > > > > > > > > may bloat drill a bit too much. Also not a big fan > of > > > > > running > > > > > > > > > against > > > > > > > > > > > an > > > > > > > > > > > > > embedded drillbit. Does not replicate an actual > > > > production > > > > > > use > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > Additionally, setting up hive hbase and other > > > components > > > > > > maybe > > > > > > > > > > painful > > > > > > > > > > > > and > > > > > > > > > > > > > unnecessary for most ppl. It would deter people > from > > > ever > > > > > > > > > > contributing > > > > > > > > > > > to > > > > > > > > > > > > > drill. We could spin up in memory hive and hbase > but > > > > that's > > > > > > > > similar > > > > > > > > > > to > > > > > > > > > > > an > > > > > > > > > > > > > embedded drill bit. Does not replicate a production > > > > > scenario. > > > > > > > > > > > > > > > > > > > > > > > > > > Would prefer the hive way with a central Jenkins > > server > > > > > > hosted > > > > > > > on > > > > > > > > > aws > > > > > > > > > > > and > > > > > > > > > > > > > accessible to everyone. Users should be able to > > > submit a > > > > > git > > > > > > > url > > > > > > > > > and > > > > > > > > > > > > that > > > > > > > > > > > > > should be able to deploy and fire off tests. Should > > > then > > > > > > have a > > > > > > > > way > > > > > > > > > > to > > > > > > > > > > > > > easily communicate failures to contributors and if > > > > success > > > > > > > notify > > > > > > > > > the > > > > > > > > > > > > > commiters to commit the change. > > > > > > > > > > > > > > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can look > > into > > > > > reuse > > > > > > > > > rather > > > > > > > > > > > than > > > > > > > > > > > > > doing it from scratch. Esp the Jenkins and > > > configuration > > > > > > stuff. > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra < > > > > > [email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Drill devs use a set of tests that are not > > available > > > as > > > > > > part > > > > > > > of > > > > > > > > > the > > > > > > > > > > > > > Apache > > > > > > > > > > > > > > distribution. These tests are a pre-requisite for > > all > > > > > > > commits, > > > > > > > > > but > > > > > > > > > > > are > > > > > > > > > > > > > not > > > > > > > > > > > > > > available to any contributors outside the current > > > devs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > This thread is to discuss various options to make > > > these > > > > > > tests > > > > > > > > > > > > available. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Assumptions and requirements - > > > > > > > > > > > > > > 1) A functional test (as opposed to a unit test) > > > needs > > > > to > > > > > > be > > > > > > > > > closer > > > > > > > > > > > to > > > > > > > > > > > > > the > > > > > > > > > > > > > > end user environment than a development > > environment. > > > As > > > > > > such, > > > > > > > > we > > > > > > > > > > > should > > > > > > > > > > > > > be > > > > > > > > > > > > > > running functional tests in a cluster > environment, > > > > > connect > > > > > > > > using > > > > > > > > > > > > > zookeeper > > > > > > > > > > > > > > etc. > > > > > > > > > > > > > > 2) Functional test will keep increasing in > number, > > > get > > > > > more > > > > > > > > > complex > > > > > > > > > > > and > > > > > > > > > > > > > > take a longer and longer time to execute as we go > > > > along. > > > > > > > > > > > > > > 3) Some requirements are: > > > > > > > > > > > > > > a) We want to be strict in enforcing the > > > pre-commit > > > > > > > > > > requirements, > > > > > > > > > > > > but > > > > > > > > > > > > > > not penalize the contributor who has a minor fix. > > > > > > > > > > > > > > b) All parts of the product (especially > various > > > > > > > 'certified' > > > > > > > > > > > storage > > > > > > > > > > > > > > plugins like Hive and Hbase should get tested) > > > > > > > > > > > > > > c) It should be easy to debug issues when a > > test > > > > > fails. > > > > > > > > Tests > > > > > > > > > > > > should > > > > > > > > > > > > > > fail deterministically. If a test fails, it > should > > > > always > > > > > > > fail > > > > > > > > > and > > > > > > > > > > > > always > > > > > > > > > > > > > > fail in the same way (easier said than done). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some suggestions - > > > > > > > > > > > > > > 1) Tests should be a top-level maven module > within > > > the > > > > > > drill > > > > > > > > > > project > > > > > > > > > > > > > > a) We want the integration tests to run > as > > > > part > > > > > of > > > > > > > the > > > > > > > > > > > drill's > > > > > > > > > > > > > > maven build process > > > > > > > > > > > > > > b) The build step for the > integration-tests > > > > > module > > > > > > > > would > > > > > > > > > > > launch > > > > > > > > > > > > > an > > > > > > > > > > > > > > embedded drillbit and runs tests against it > > > > > > > > > > > > > > c) The tests will be a separate target so > > > they > > > > > need > > > > > > > not > > > > > > > > > be > > > > > > > > > > > run > > > > > > > > > > > > > all > > > > > > > > > > > > > > the time > > > > > > > > > > > > > > 2) Tests should be divided into multiple suites > > that > > > > are > > > > > > > based > > > > > > > > > on > > > > > > > > > > > > > > components. For example a test suite for testing > > > > > datatypes > > > > > > > will > > > > > > > > > > > contain > > > > > > > > > > > > > the > > > > > > > > > > > > > > tests for various datatypes including complex > > types. > > > A > > > > > > > > > contributor > > > > > > > > > > or > > > > > > > > > > > > > > developer can then run these tests more > frequently > > as > > > > an > > > > > > > issue > > > > > > > > is > > > > > > > > > > > being > > > > > > > > > > > > > > addressed and run the entire suite only once > before > > > > > commit. > > > > > > > > > > > > > > 3) Provide the tests as a hosted service > > > > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster > > and > > > > > post > > > > > > > the > > > > > > > > > > > results > > > > > > > > > > > > to > > > > > > > > > > > > > > the JIRA (Hive does this). Or some variant of > this > > > > idea. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some questions - > > > > > > > > > > > > > > 1) What do some other projects do? > > > > > > > > > > > > > > 2) Are there any technologies we can leverage > that > > > will > > > > > > make > > > > > > > > this > > > > > > > > > > > > easier? > > > > > > > > > > > > > > 3) How do we make it easier to debug failing > tests. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to question the assumptions and > > > > > > > requirements. > > > > > > > > Be > > > > > > > > > > > > > creative > > > > > > > > > > > > > > with your suggestions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Parth > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
