If you'd like to contribute a patch to Impala, but aren't sure what you want to work on, you can look at Impala's newbie issues: https://issues.apache.org/jira/issues/?filter=12341668. You can find detailed instructions on submitting patches at https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala. This is a walkthrough of a ticket a new contributor could take on, with hopefully enough detail to get you going but not so much to take away the fun.
How can we fix https://issues.apache.org/jira/browse/IMPALA-5754, "rand() algorithm is very non-random"? This is a partial walk-through of how to get started. Set up your development environment. Then, look for where we might first write a failing test. The test case given in the ticket is "select count(distinct(rand(867-5309))), count(*) from alltypes a, alltypes b;". Tests that run a full query are considered "end-to-end tests". End-to-end tests are described in two ways: .test files and .py files. .test files contain queries and their expected results. For example: ==== ---- QUERY # Regression test for IMPALA-938 select smallint_col, int_col, (cast("1970-01-01" as timestamp) + interval smallint_col days) from functional.alltypes where smallint_col = 1 limit 1 ---- RESULTS 1,1,1970-01-02 00:00:00 ---- TYPES smallint, int, timestamp ==== That is taken from testdata/workloads/functional-query/queries/QueryTest/exprs.test. That's a good test file to add a test case to, since it is testing "exprs", and the bug is in MathFunctions::Rand, which is defined in be/src/exprs. First, let's run all of the exprs tests to see that they pass. You can see them called in tests/query_test/test_exprs.py. The Python scrips in tests/ can run these .test files by calling ImpalaTestSuite's run_test_case() method with an abbreviated name of the .test file. In test_exprs.py, this looks like self.run_test_case('QueryTest/exprs', vector) That call is in the method TestExprs.test_exprs(); you can invoke it with: ./bin/impala-py.test tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity This should take about 40 seconds and should pass, indicated by a return value of 0 and a green line printed to the terminal reading: ...====== 1 passed in 39.85 seconds ======... Now add a test case, following the example from the ticket and the format in exprs.test. Run the test again; it should fail. Fix the bug and run the test again. Once the test is passing, follow the instructions on the wiki to send your patch for code review: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
