(running C* 2.2.7) On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote:
> Hi Yuji, > > I tried your script a couple of times. I did not experience any stale > values. (On my Linux laptop) > > regards, > Ch > > On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote: > >> Hi, >> >> I can reproduce the problem with the following script. >> I got rows which should be truncated. >> If truncating is executed only once, the problem doesn't occur. >> >> The test for multi nodes (replication_factor:3, kill & restart C* >> processes in all nodes) can also reproduce it. >> >> test script: >> ---- >> >> ip=xxx.xxx.xxx.xxx >> >> echo "0. prepare a table" >> cqlsh $ip -e "drop keyspace testdb;" >> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >> 'SimpleStrategy', 'replication_factor': '1'};" >> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);" >> >> echo "1. insert rows" >> for key in $(seq 1 10) >> do >> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, >> 1000) IF NOT EXISTS;" >> /dev/null 2>&1 >> done >> >> echo "2. truncate the table twice" >> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >> >> echo "3. kill C* process" >> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}' >> | xargs sudo kill -9 >> >> echo "4. restart C* process" >> sudo /etc/init.d/cassandra start >> sleep 20 >> >> echo "5. check the table" >> cqlsh $ip -e "select * from testdb.testtbl;" >> >> ---- >> >> test result: >> ---- >> >> 0. prepare a table >> 1. insert rows >> 2. truncate the table twice >> Consistency level set to ALL. >> Consistency level set to ALL. >> 3. kill C* process >> 4. restart C* process >> Starting Cassandra: OK >> 5. check the table >> >> key | val >> -----+------ >> 5 | 1000 >> 10 | 1000 >> 1 | 1000 >> 8 | 1000 >> 2 | 1000 >> 4 | 1000 >> 7 | 1000 >> 6 | 1000 >> 9 | 1000 >> 3 | 1000 >> >> (10 rows) >> >> ---- >> >> >> Thanks Christian, >> >> I tried with durable_writes=False. >> It failed. I guessed this failure was caused by another problem. >> I use SimpleStrategy. >> A keyspace using the SimpleStrategy isn't permitted to use >> durable_writes=False. >> >> >> Regards, >> Yuji >> >> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: >> >>> Hi Yuji, >>> >>> ok, perhaps you are seeing a different issue than I do. >>> >>> Have you tried with durable_writes=False? If the issue is caused by the >>> commitlog, then it should work if you disable durable_writes. >>> >>> Cheers, >>> Christian >>> >>> >>> >>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>> >>>> Thanks Christian >>>> >>>> can you reproduce the behaviour with a single node? >>>> >>>> I tried my test with a single node. But I can't. >>>> >>>> This behaviour is seems to be CQL only, or at least has gotten worse >>>>> with CQL. I did not experience this with Thrift. >>>> >>>> I truncate tables with CQL. I've never tried with Thrift. >>>> >>>> I think that my problem can happen when truncating even succeeds. >>>> That's because I check all records after truncating. >>>> >>>> I checked the source code. >>>> ReplayPosition.segment and position become -1 and 0 >>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there >>>> is no SSTable. >>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a >>>> table in this case. >>>> replayMutation() can request unexpected replay mutations because of >>>> this segment's value. >>>> >>>> Is there anyone familiar with truncate and replay? >>>> >>>> Regards, >>>> Yuji >>>> >>>> >>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >>>> >>>>> Hi Yuji, >>>>> >>>>> can you reproduce the behaviour with a single node? >>>>> >>>>> The reason I ask is because I probably have the same issue with my >>>>> automated tests (which run truncate between every test), which run on my >>>>> local laptop. >>>>> >>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the >>>>> failed tests sometimes show data from other tests, which I think must be >>>>> because of a failed truncate. This behaviour is seems to be CQL only, or >>>>> at >>>>> least has gotten worse with CQL. I did not experience this with Thrift. >>>>> >>>>> regards, >>>>> Christian >>>>> >>>>> >>>>> >>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I have a question about clearing table and commit log replay. >>>>>> After some tables were truncated consecutively, I got some stale >>>>>> values. >>>>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>>>> CREATE). >>>>>> >>>>>> I'm testing the following test with node failure. >>>>>> Some stale values appear at checking phase. >>>>>> >>>>>> Test iteration: >>>>>> 1. initialize tables as below >>>>>> 2. request a lot of read/write concurrently >>>>>> 3. check all records >>>>>> 4. repeat from the beginning >>>>>> >>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>>>> Each node kills cassandra process at random intervals and restarts it >>>>>> immediately. >>>>>> >>>>>> My initialization: >>>>>> 1. clear tables with TRUNCATE >>>>>> 2. INSERT initial records >>>>>> 3. check if all values are correct >>>>>> >>>>>> If any phase fails (because of node failure), the initialization >>>>>> starts all over again. >>>>>> So, tables are sometimes truncated consecutively. >>>>>> Though the check in the initialization is OK, stale data appears when >>>>>> I execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are >>>>>> completed. >>>>>> >>>>>> The problem is likely to occur when the ReplayPosition's value in >>>>>> "truncated_at" is initialized as below after an empty table is truncated. >>>>>> >>>>>> Column Family ID: truncated_at >>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000 >>>>>> 0156597cd4c7 >>>>>> (this value was acquired just after phase 1 in my initialization) >>>>>> >>>>>> I guess some unexpected replays occur. >>>>>> Does anyone know the behavior? >>>>>> >>>>>> Thanks, >>>>>> Yuji >>>>>> >>>>> >>>>> >>>> >>> >> >