[
https://issues.apache.org/jira/browse/CASSANDRA-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281326#comment-16281326
]
Michael Kjellman edited comment on CASSANDRA-14054 at 12/7/17 4:44 AM:
-----------------------------------------------------------------------
[~alourie] hey, so sorry for the delayed reply.. i've been up to my eyeballs in
the dtest pytest work along with all the other stuff and totally let this slip.
I don't have a super great answer for you yet because I'm in the process of
getting that story together... but maybe we can make this work :)
If you take a look at my C* fork, there is a CircleCI config:
https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml
Create a free CircleCI account (if you don't have one yet) and register your C*
fork on GitHub with CircleCI. Then, grab the above config and put it in a
branch of trunk in your personal fork (you'll need to create a .circleci folder
and put it in there.
Starting at L47 of the config you'll need to switch things to use the free user
config (i'm running under the assumption you don't have a paid CircleCI account
here).
{code}
# Set env_settings, env_vars, and workflows/build_and_run_tests based on
environment
env_settings: &env_settings
# <<: *default_env_settings
<<: *high_capacity_env_settings
env_vars: &env_vars
# <<: *default_env_vars
<<: *high_capacity_env_vars
workflows:
version: 2
# build_and_run_tests: *default_jobs
build_and_run_tests: *with_dtest_jobs
{code}
comment out the instances of high_capacity_* and comment back in the default_*
ones... and you might want to switch the workflows to only run the
"default_jobs" which for right now will just build C* and run the unit tests.
This test fails about 50% of the time on CircleCI. Potentially it's exacerbated
by running on Ubuntu? Another thing maybe worth trying is running the test via
ant on ubuntu... The docker image I put together for CircleCI is available on
DockerHub (config checked in to
https://github.com/mkjellman/cassandra-test-docker) or you can grab it as
kjellman/cassandra-test:0.1.3.
Another thing that we do is split up the unit tests across the total number of
Circle containers available... based on historical runs it actually will try to
distribute the tests that run in each container by time so you don't have a few
containers with all the slow tests dragging the entire thing down. This means
we use invoke the tests in each container via "ant testclasslist
-Dtest.classlistfile=/path/to/unit/tests/to/run"... potentially maybe another
test somewhere else doesn't clean up after itself and that causes
testRegularColumnTimestampUpdates to fail? To be clear -- the splits across
containers are on a per test method level -- not test class -- so you might
have various methods of ViewTest run across different containers at the same
time -- the results are all merged together by circle at the end to give one
consolidated report for all the unit tests. none of the other unit tests on
trunk have been flaky or failing when run via circle other than this test so
I'm not sure I totally believe it's related to order it's run in or another
test not cleaning up after itself -- also there are a lot of other asserts that
are passing before the 2nd to last assert is hit (which is the one that's
always failing -- and always failing with the same value of 1 instead of 2)...
hope all this helps get the ball rolling again... any hunches by just looking
at the code? i don't really know the MV code very well... any chance there is a
race between when the mv is completed building and available and when the
assert is hit? maybe we need some kind of force blocking flush before we assert
on those conditions? that's how we handle this in a lot of the other compaction
related tests that check sstables on disk and row count...
was (Author: mkjellman):
[~alourie] hey, so sorry for the delayed reply.. i've been up to my eyeballs in
the dtest pytest work along with all the other stuff and totally let this slip.
I don't have a super great answer for you yet because I'm in the process of
getting that story together... but maybe we can make this work :)
If you take a look at my C* fork, there is a CircleCI config:
https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml
Create a free CircleCI account (if you don't have one yet) and register your C*
fork on GitHub with CircleCI. Then, grab the above config and put it in a
branch of trunk in your personal fork (you'll need to create a .circleci folder
and put it in there.
Starting at L47 of the config you'll need to switch things to use the free user
config (i'm running under the assumption you don't have a paid CircleCI account
here).
{code}
# Set env_settings, env_vars, and workflows/build_and_run_tests based on
environment
env_settings: &env_settings
# <<: *default_env_settings
<<: *high_capacity_env_settings
env_vars: &env_vars
# <<: *default_env_vars
<<: *high_capacity_env_vars
workflows:
version: 2
# build_and_run_tests: *default_jobs
build_and_run_tests: *with_dtest_jobs
comment out the instances of high_capacity_* and comment back in the default_*
ones... and you might want to switch the workflows to only run the
"default_jobs" which for right now will just build C* and run the unit tests.
This test fails about 50% of the time on CircleCI. Potentially it's exacerbated
by running on Ubuntu? Another thing maybe worth trying is running the test via
ant on ubuntu... The docker image I put together for CircleCI is available on
DockerHub (config checked in to
https://github.com/mkjellman/cassandra-test-docker) or you can grab it as
kjellman/cassandra-test:0.1.3.
Another thing that we do is split up the unit tests across the total number of
Circle containers available... based on historical runs it actually will try to
distribute the tests that run in each container by time so you don't have a few
containers with all the slow tests dragging the entire thing down. This means
we use invoke the tests in each container via "ant testclasslist
-Dtest.classlistfile=/path/to/unit/tests/to/run"... potentially maybe another
test somewhere else doesn't clean up after itself and that causes
testRegularColumnTimestampUpdates to fail? To be clear -- the splits across
containers are on a per test method level -- not test class -- so you might
have various methods of ViewTest run across different containers at the same
time -- the results are all merged together by circle at the end to give one
consolidated report for all the unit tests. none of the other unit tests on
trunk have been flaky or failing when run via circle other than this test so
I'm not sure I totally believe it's related to order it's run in or another
test not cleaning up after itself -- also there are a lot of other asserts that
are passing before the 2nd to last assert is hit (which is the one that's
always failing -- and always failing with the same value of 1 instead of 2)...
hope all this helps get the ball rolling again... any hunches by just looking
at the code? i don't really know the MV code very well... any chance there is a
race between when the mv is completed building and available and when the
assert is hit? maybe we need some kind of force blocking flush before we assert
on those conditions? that's how we handle this in a lot of the other compaction
related tests that check sstables on disk and row count...
> testRegularColumnTimestampUpdates - org.apache.cassandra.cql3.ViewTest is
> flaky: expected <2> but got <1>
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-14054
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14054
> Project: Cassandra
> Issue Type: Bug
> Components: Testing
> Reporter: Michael Kjellman
> Assignee: Alex Lourie
>
> testRegularColumnTimestampUpdates - org.apache.cassandra.cql3.ViewTest is
> flaky: expected <2> but got <1>
> Fails about 25% of the time. It is currently our only flaky unit test on
> trunk so it would be great to get this one fixed up so we can be confident in
> unit test failures going forward.
> junit.framework.AssertionFailedError: Invalid value for row 0 column 0 (c of
> type int), expected <2> but got <1>
> at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:973)
> at
> org.apache.cassandra.cql3.ViewTest.testRegularColumnTimestampUpdates(ViewTest.java:380)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]