I have posted a link on the ticket to https://lists.apache.org/thread.html/6fbcfa650cbb920e2b517ae643bcd0859f1ba0368451d2949eda274d@%3Cdev.impala.apache.org%3E. I hope to write some more of these, after which perhaps I should make a space on the wiki to hold them all.
On Wed, Sep 6, 2017 at 10:08 AM, Todd Lipcon <[email protected]> wrote: > Hey JIm, > > This is a great tutorial, thanks for posting it. One thought: would be > great to put this somewhere on the web -- either as a blog post or wiki > entry, so if someone googles they are more likely to find it. (sometimes > mailing list archives are harder to bring up in google results) > > On Wed, Sep 6, 2017 at 10:05 AM, Jim Apple <[email protected]> wrote: > >> If you'd like to contribute a patch to Impala, but aren't sure what >> you want to work on, you can look at Impala's newbie issues: >> https://issues.apache.org/jira/issues/?filter=12341668. You can find >> detailed instructions on submitting patches at >> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala. >> This is a walkthrough of a ticket a new contributor could take on, >> with hopefully enough detail to get you going but not so much to take >> away the fun. >> >> How can we fix https://issues.apache.org/jira/browse/IMPALA-5754, >> "rand() algorithm is very non-random"? This is a partial walk-through >> of how to get started. >> >> Set up your development environment. Then, look for where we might >> first write a failing test. The test case given in the ticket is >> "select count(distinct(rand(867-5309))), count(*) from alltypes a, >> alltypes b;". Tests that run a full query are considered "end-to-end >> tests". >> >> End-to-end tests are described in two ways: .test files and .py files. >> >> .test files contain queries and their expected results. For example: >> >> ==== >> ---- QUERY >> # Regression test for IMPALA-938 >> select smallint_col, int_col, (cast("1970-01-01" as timestamp) + >> interval smallint_col days) >> from functional.alltypes where smallint_col = 1 limit 1 >> ---- RESULTS >> 1,1,1970-01-02 00:00:00 >> ---- TYPES >> smallint, int, timestamp >> ==== >> >> That is taken from >> testdata/workloads/functional-query/queries/QueryTest/exprs.test. >> That's a good test file to add a test case to, since it is testing >> "exprs", and the bug is in MathFunctions::Rand, which is defined in >> be/src/exprs. >> >> First, let's run all of the exprs tests to see that they pass. You can >> see them called in tests/query_test/test_exprs.py. The Python scrips >> in tests/ can run these .test files by calling ImpalaTestSuite's >> run_test_case() method with an abbreviated name of the .test file. In >> test_exprs.py, this looks like >> >> self.run_test_case('QueryTest/exprs', vector) >> >> That call is in the method TestExprs.test_exprs(); you can invoke it with: >> >> ./bin/impala-py.test >> tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity >> >> This should take about 40 seconds and should pass, indicated by a >> return value of 0 and a green line printed to the terminal reading: >> >> ...====== 1 passed in 39.85 seconds ======... >> >> Now add a test case, following the example from the ticket and the >> format in exprs.test. Run the test again; it should fail. >> >> Fix the bug and run the test again. Once the test is passing, follow >> the instructions on the wiki to send your patch for code review: >> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera
