Paul Rogers created DRILL-6034: ---------------------------------- Summary: repeated_contains returns a count, not a boolean, subject to overflow Key: DRILL-6034 URL: https://issues.apache.org/jira/browse/DRILL-6034 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Paul Rogers
Consider the existing Drill unit tests {{testJsonReader.testRepeatedContains()}}. Consider the following query: {code} select repeated_contains(str_list, 'asdf') from cp.`store/json/json_basic_repeated_varchar.json` {code} According to the [documentation|http://drill.apache.org/docs/repeated-contains/]: bq. REPEATED_CONTAINS returns true if Drill finds a match; otherwise, the function returns false. Run the above query and print the results: {noformat} select repeated_contains(str_list, 'asdf') from cp.`store/json/json_basic_repeated_varchar.json` #: EXPR$0 0: 5 1: 0 2: 0 3: 0 {noformat} Note that the first row has a value of 5 which is *not* a Boolean. Drill has no Boolean type and instead uses the traditional encoding to integers: {{TRUE}} = 1, {{FALSE}} = 0. A value of 5 is not a valid Boolean value. It may be that the following expression will fail: {code} SELECT * FROM cp.`store/json/json_basic_repeated_varchar.json` WHERE repeated_contains(str_list, 'asdf') = TRUE {code} The schema of the returned count value uses the Drill {{BIT}} type. For various historical reasons, Drill implements {{BIT}} as "UInt1" -- an unsigned 8 bit integer. Further, since the function seems to return a count, it is subject to overflow if the count is 256, 512 or any multiple o 256. That is, if a list has 256 occurrences of the pattern, {{repeated_contains}} will return 256 modulo 256 = 0, which is the equivalent of SQL {{FALSE}}. The recommendation is that the function be modified to return either 1 or 0. If there is a reason to have a count, use the existing {{repeated_count}} function. Note that the "test" never caught this because it simply ran the query, but did not verify results: {code} @Test public void testRepeatedContains() throws Exception { test("select repeated_contains(str_list, 'asdf') from cp.`store/json/json_basic_repeated_varchar.json`"); ... {code} The issue was revealed when adding verification. For now, the new test verifies the incorrect results; it should be modified to match the documented results if/when the code is updated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)