Hey JIm, This is a great tutorial, thanks for posting it. One thought: would be great to put this somewhere on the web -- either as a blog post or wiki entry, so if someone googles they are more likely to find it. (sometimes mailing list archives are harder to bring up in google results)
On Wed, Sep 6, 2017 at 10:05 AM, Jim Apple <[email protected]> wrote: > If you'd like to contribute a patch to Impala, but aren't sure what > you want to work on, you can look at Impala's newbie issues: > https://issues.apache.org/jira/issues/?filter=12341668. You can find > detailed instructions on submitting patches at > https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala. > This is a walkthrough of a ticket a new contributor could take on, > with hopefully enough detail to get you going but not so much to take > away the fun. > > How can we fix https://issues.apache.org/jira/browse/IMPALA-5754, > "rand() algorithm is very non-random"? This is a partial walk-through > of how to get started. > > Set up your development environment. Then, look for where we might > first write a failing test. The test case given in the ticket is > "select count(distinct(rand(867-5309))), count(*) from alltypes a, > alltypes b;". Tests that run a full query are considered "end-to-end > tests". > > End-to-end tests are described in two ways: .test files and .py files. > > .test files contain queries and their expected results. For example: > > ==== > ---- QUERY > # Regression test for IMPALA-938 > select smallint_col, int_col, (cast("1970-01-01" as timestamp) + > interval smallint_col days) > from functional.alltypes where smallint_col = 1 limit 1 > ---- RESULTS > 1,1,1970-01-02 00:00:00 > ---- TYPES > smallint, int, timestamp > ==== > > That is taken from > testdata/workloads/functional-query/queries/QueryTest/exprs.test. > That's a good test file to add a test case to, since it is testing > "exprs", and the bug is in MathFunctions::Rand, which is defined in > be/src/exprs. > > First, let's run all of the exprs tests to see that they pass. You can > see them called in tests/query_test/test_exprs.py. The Python scrips > in tests/ can run these .test files by calling ImpalaTestSuite's > run_test_case() method with an abbreviated name of the .test file. In > test_exprs.py, this looks like > > self.run_test_case('QueryTest/exprs', vector) > > That call is in the method TestExprs.test_exprs(); you can invoke it with: > > ./bin/impala-py.test > tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity > > This should take about 40 seconds and should pass, indicated by a > return value of 0 and a green line printed to the terminal reading: > > ...====== 1 passed in 39.85 seconds ======... > > Now add a test case, following the example from the ticket and the > format in exprs.test. Run the test again; it should fail. > > Fix the bug and run the test again. Once the test is passing, follow > the instructions on the wiki to send your patch for code review: > https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala > -- Todd Lipcon Software Engineer, Cloudera
