Nop, still don't get stale values. (I just ran your script 3 times) On Thu, Aug 25, 2016 at 12:36 PM, Yuji Ito <y...@imagine-orb.com> wrote:
> Thank you for testing, Christian > > What did you set commitlog_sync in cassandra.yaml? > I set commitlog_sync batch (window 2ms) as below. > > commitlog_sync: batch > commitlog_sync_batch_window_in_ms: 2 > > The problem didn't occur by setting commitlog_sync periodic(default). > > regards, > yuji > > > On Thu, Aug 25, 2016 at 6:11 PM, horschi <hors...@gmail.com> wrote: > >> (running C* 2.2.7) >> >> On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote: >> >>> Hi Yuji, >>> >>> I tried your script a couple of times. I did not experience any stale >>> values. (On my Linux laptop) >>> >>> regards, >>> Ch >>> >>> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote: >>> >>>> Hi, >>>> >>>> I can reproduce the problem with the following script. >>>> I got rows which should be truncated. >>>> If truncating is executed only once, the problem doesn't occur. >>>> >>>> The test for multi nodes (replication_factor:3, kill & restart C* >>>> processes in all nodes) can also reproduce it. >>>> >>>> test script: >>>> ---- >>>> >>>> ip=xxx.xxx.xxx.xxx >>>> >>>> echo "0. prepare a table" >>>> cqlsh $ip -e "drop keyspace testdb;" >>>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >>>> 'SimpleStrategy', 'replication_factor': '1'};" >>>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val >>>> int);" >>>> >>>> echo "1. insert rows" >>>> for key in $(seq 1 10) >>>> do >>>> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, >>>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1 >>>> done >>>> >>>> echo "2. truncate the table twice" >>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>>> >>>> echo "3. kill C* process" >>>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print >>>> $2}' | xargs sudo kill -9 >>>> >>>> echo "4. restart C* process" >>>> sudo /etc/init.d/cassandra start >>>> sleep 20 >>>> >>>> echo "5. check the table" >>>> cqlsh $ip -e "select * from testdb.testtbl;" >>>> >>>> ---- >>>> >>>> test result: >>>> ---- >>>> >>>> 0. prepare a table >>>> 1. insert rows >>>> 2. truncate the table twice >>>> Consistency level set to ALL. >>>> Consistency level set to ALL. >>>> 3. kill C* process >>>> 4. restart C* process >>>> Starting Cassandra: OK >>>> 5. check the table >>>> >>>> key | val >>>> -----+------ >>>> 5 | 1000 >>>> 10 | 1000 >>>> 1 | 1000 >>>> 8 | 1000 >>>> 2 | 1000 >>>> 4 | 1000 >>>> 7 | 1000 >>>> 6 | 1000 >>>> 9 | 1000 >>>> 3 | 1000 >>>> >>>> (10 rows) >>>> >>>> ---- >>>> >>>> >>>> Thanks Christian, >>>> >>>> I tried with durable_writes=False. >>>> It failed. I guessed this failure was caused by another problem. >>>> I use SimpleStrategy. >>>> A keyspace using the SimpleStrategy isn't permitted to use >>>> durable_writes=False. >>>> >>>> >>>> Regards, >>>> Yuji >>>> >>>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: >>>> >>>>> Hi Yuji, >>>>> >>>>> ok, perhaps you are seeing a different issue than I do. >>>>> >>>>> Have you tried with durable_writes=False? If the issue is caused by >>>>> the commitlog, then it should work if you disable durable_writes. >>>>> >>>>> Cheers, >>>>> Christian >>>>> >>>>> >>>>> >>>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>>>> >>>>>> Thanks Christian >>>>>> >>>>>> can you reproduce the behaviour with a single node? >>>>>> >>>>>> I tried my test with a single node. But I can't. >>>>>> >>>>>> This behaviour is seems to be CQL only, or at least has gotten worse >>>>>>> with CQL. I did not experience this with Thrift. >>>>>> >>>>>> I truncate tables with CQL. I've never tried with Thrift. >>>>>> >>>>>> I think that my problem can happen when truncating even succeeds. >>>>>> That's because I check all records after truncating. >>>>>> >>>>>> I checked the source code. >>>>>> ReplayPosition.segment and position become -1 and 0 >>>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when >>>>>> there >>>>>> is no SSTable. >>>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a >>>>>> table in this case. >>>>>> replayMutation() can request unexpected replay mutations because of >>>>>> this segment's value. >>>>>> >>>>>> Is there anyone familiar with truncate and replay? >>>>>> >>>>>> Regards, >>>>>> Yuji >>>>>> >>>>>> >>>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >>>>>> >>>>>>> Hi Yuji, >>>>>>> >>>>>>> can you reproduce the behaviour with a single node? >>>>>>> >>>>>>> The reason I ask is because I probably have the same issue with my >>>>>>> automated tests (which run truncate between every test), which run on my >>>>>>> local laptop. >>>>>>> >>>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that >>>>>>> the failed tests sometimes show data from other tests, which I think >>>>>>> must >>>>>>> be because of a failed truncate. This behaviour is seems to be CQL >>>>>>> only, or >>>>>>> at least has gotten worse with CQL. I did not experience this with >>>>>>> Thrift. >>>>>>> >>>>>>> regards, >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a question about clearing table and commit log replay. >>>>>>>> After some tables were truncated consecutively, I got some stale >>>>>>>> values. >>>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>>>>>> CREATE). >>>>>>>> >>>>>>>> I'm testing the following test with node failure. >>>>>>>> Some stale values appear at checking phase. >>>>>>>> >>>>>>>> Test iteration: >>>>>>>> 1. initialize tables as below >>>>>>>> 2. request a lot of read/write concurrently >>>>>>>> 3. check all records >>>>>>>> 4. repeat from the beginning >>>>>>>> >>>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>>>>>> Each node kills cassandra process at random intervals and restarts >>>>>>>> it immediately. >>>>>>>> >>>>>>>> My initialization: >>>>>>>> 1. clear tables with TRUNCATE >>>>>>>> 2. INSERT initial records >>>>>>>> 3. check if all values are correct >>>>>>>> >>>>>>>> If any phase fails (because of node failure), the initialization >>>>>>>> starts all over again. >>>>>>>> So, tables are sometimes truncated consecutively. >>>>>>>> Though the check in the initialization is OK, stale data appears >>>>>>>> when I execute "SELECT * FROM mykeyspace.mytable;" after a lot of >>>>>>>> requests >>>>>>>> are completed. >>>>>>>> >>>>>>>> The problem is likely to occur when the ReplayPosition's value in >>>>>>>> "truncated_at" is initialized as below after an empty table is >>>>>>>> truncated. >>>>>>>> >>>>>>>> Column Family ID: truncated_at >>>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: >>>>>>>> 0xffffffffffffffff0000000000000156597cd4c7 >>>>>>>> (this value was acquired just after phase 1 in my initialization) >>>>>>>> >>>>>>>> I guess some unexpected replays occur. >>>>>>>> Does anyone know the behavior? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yuji >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >