mcvsubbu opened a new issue #5346:
URL: https://github.com/apache/incubator-pinot/issues/5346
Recently, a bug was introduced in PR #5132 that led to bad results when
executing queries that involved indexed and non-indexed columns in a certain
combination. We had a query in our integration test that tested this
combination, but then queries are run at random from the query file (I think it
is 100 queries out of 10k). So, some runs of travis would fail, but a re-run
could well pass. The bug was discovered in LinkedIn's test environment (where
the mvn test command is run repeatedly and multiple failures were noticed). The
bug was fixed in PR #5328
So, we had the test for it, and still could not catch the bug.
I did a test on my desktop to see how much time would be taken if we enabled
all 10k queries to be run, and it came to about 8 hours!
The chief time-consumers were as below:
```testQueriesFromQueryFile(org.apache.pinot.integration.tests.HybridClusterIntegrationTest)
Time elapsed: 2,470.777 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.HybridClusterIntegrationTest)
Time elapsed: 1,942.148 sec
testQueriesFromQueryFile(org.apache.pinot.integration.tests.FlakyConsumerRealtimeClusterIntegrationTest)
Time elapsed: 2,536.809 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.FlakyConsumerRealtimeClusterIntegrationTest)
Time elapsed: 1,992.969 sec
testQueriesFromQueryFile(org.apache.pinot.integration.tests.ConvertToRawIndexMinionClusterIntegrationTest)
Time elapsed: 2,509.986 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.ConvertToRawIndexMinionClusterIntegrationTest)
Time elapsed: 1,972.634 sec
testQueriesFromQueryFile(org.apache.pinot.integration.tests.MultiNodesOfflineClusterIntegrationTest)
Time elapsed: 2,522.219 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.MultiNodesOfflineClusterIntegrationTest)
Time elapsed: 1,957.898 sec
testQueriesFromQueryFile(org.apache.pinot.integration.tests.LLCRealtimeClusterIntegrationTest)
Time elapsed: 2,497.354 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.LLCRealtimeClusterIntegrationTest)
Time elapsed: 1,968.099 sec
testQueriesFromQueryFile(org.apache.pinot.integration.tests.RealtimeClusterIntegrationTest)
Time elapsed: 2,521.568 sec
testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.RealtimeClusterIntegrationTest)
Time elapsed: 2,001.346 sec
```
We do disable some tests in travis:
``` <excludes>
<!-- Covered by FlakyConsumerRealtimeClusterIntegrationTest
-->
<exclude>**/RealtimeClusterIntegrationTest.java</exclude>
<!-- Covered by
ConvertToRawIndexMinionClusterIntegrationTest -->
<exclude>**/HybridClusterIntegrationTest.java</exclude>
</excludes>
```
But the remaining tests still add a lot of time.
The goal is to be able to discover problems early, preferably before
merging. Some randomness cannot be avoided (e.g. generation of data), so we
will live with that.
One way to get this is to comb the 10k queries to select a few hundred that
can be enough to detect all issues. And then we add queries to this selected
list as we find more bugs.
Another way is to increase travis time to whatever number of hours needed to
get all 10k queries to run. This is probably not desirable, and will slow us
down.
Another possibility is to set up a regular run of the full stuff (say, once
a day) so that we catch the issue next day. But then we will still be left with
the commit in the master.
Thoughts?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]