Hi, I can reproduce the problem with the following script. I got rows which should be truncated. If truncating is executed only once, the problem doesn't occur.
The test for multi nodes (replication_factor:3, kill & restart C* processes in all nodes) can also reproduce it. test script: ---- ip=xxx.xxx.xxx.xxx echo "0. prepare a table" cqlsh $ip -e "drop keyspace testdb;" cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};" cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);" echo "1. insert rows" for key in $(seq 1 10) do cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, 1000) IF NOT EXISTS;" >> /dev/null 2>&1 done echo "2. truncate the table twice" cqlsh $ip -e "consistency all; truncate table testdb.testtbl" cqlsh $ip -e "consistency all; truncate table testdb.testtbl" echo "3. kill C* process" ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}' | xargs sudo kill -9 echo "4. restart C* process" sudo /etc/init.d/cassandra start sleep 20 echo "5. check the table" cqlsh $ip -e "select * from testdb.testtbl;" ---- test result: ---- 0. prepare a table 1. insert rows 2. truncate the table twice Consistency level set to ALL. Consistency level set to ALL. 3. kill C* process 4. restart C* process Starting Cassandra: OK 5. check the table key | val -----+------ 5 | 1000 10 | 1000 1 | 1000 8 | 1000 2 | 1000 4 | 1000 7 | 1000 6 | 1000 9 | 1000 3 | 1000 (10 rows) ---- Thanks Christian, I tried with durable_writes=False. It failed. I guessed this failure was caused by another problem. I use SimpleStrategy. A keyspace using the SimpleStrategy isn't permitted to use durable_writes=False. Regards, Yuji On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: > Hi Yuji, > > ok, perhaps you are seeing a different issue than I do. > > Have you tried with durable_writes=False? If the issue is caused by the > commitlog, then it should work if you disable durable_writes. > > Cheers, > Christian > > > > On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote: > >> Thanks Christian >> >> can you reproduce the behaviour with a single node? >> >> I tried my test with a single node. But I can't. >> >> This behaviour is seems to be CQL only, or at least has gotten worse with >>> CQL. I did not experience this with Thrift. >> >> I truncate tables with CQL. I've never tried with Thrift. >> >> I think that my problem can happen when truncating even succeeds. >> That's because I check all records after truncating. >> >> I checked the source code. >> ReplayPosition.segment and position become -1 and 0 (ReplayPosition.NONE) >> in dscardSSTables() at truncating a table when there is no SSTable. >> I guess that ReplayPosition.segment shouldn't be -1 at truncating a table >> in this case. >> replayMutation() can request unexpected replay mutations because of this >> segment's value. >> >> Is there anyone familiar with truncate and replay? >> >> Regards, >> Yuji >> >> >> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >> >>> Hi Yuji, >>> >>> can you reproduce the behaviour with a single node? >>> >>> The reason I ask is because I probably have the same issue with my >>> automated tests (which run truncate between every test), which run on my >>> local laptop. >>> >>> Maybe around 5 tests randomly fail out of my 1800. I can see that the >>> failed tests sometimes show data from other tests, which I think must be >>> because of a failed truncate. This behaviour is seems to be CQL only, or at >>> least has gotten worse with CQL. I did not experience this with Thrift. >>> >>> regards, >>> Christian >>> >>> >>> >>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote: >>> >>>> Hi all, >>>> >>>> I have a question about clearing table and commit log replay. >>>> After some tables were truncated consecutively, I got some stale values. >>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>> CREATE). >>>> >>>> I'm testing the following test with node failure. >>>> Some stale values appear at checking phase. >>>> >>>> Test iteration: >>>> 1. initialize tables as below >>>> 2. request a lot of read/write concurrently >>>> 3. check all records >>>> 4. repeat from the beginning >>>> >>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>> Each node kills cassandra process at random intervals and restarts it >>>> immediately. >>>> >>>> My initialization: >>>> 1. clear tables with TRUNCATE >>>> 2. INSERT initial records >>>> 3. check if all values are correct >>>> >>>> If any phase fails (because of node failure), the initialization starts >>>> all over again. >>>> So, tables are sometimes truncated consecutively. >>>> Though the check in the initialization is OK, stale data appears when I >>>> execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are >>>> completed. >>>> >>>> The problem is likely to occur when the ReplayPosition's value in >>>> "truncated_at" is initialized as below after an empty table is truncated. >>>> >>>> Column Family ID: truncated_at >>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000 >>>> 0156597cd4c7 >>>> (this value was acquired just after phase 1 in my initialization) >>>> >>>> I guess some unexpected replays occur. >>>> Does anyone know the behavior? >>>> >>>> Thanks, >>>> Yuji >>>> >>> >>> >> >