Re: Stale value appears after consecutive TRUNCATE

horschi Thu, 25 Aug 2016 07:42:08 -0700

Nop, still don't get stale values. (I just ran your script 3 times)

On Thu, Aug 25, 2016 at 12:36 PM, Yuji Ito <y...@imagine-orb.com> wrote:


> Thank you for testing, Christian
>
> What did you set commitlog_sync in cassandra.yaml?
> I set commitlog_sync batch (window 2ms) as below.
>
> commitlog_sync: batch
> commitlog_sync_batch_window_in_ms: 2
>
> The problem didn't occur by setting  commitlog_sync periodic(default).
>
> regards,
> yuji
>
>
> On Thu, Aug 25, 2016 at 6:11 PM, horschi <hors...@gmail.com> wrote:
>
>> (running C* 2.2.7)
>>
>> On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote:
>>
>>> Hi Yuji,
>>>
>>> I tried your script a couple of times. I did not experience any stale
>>> values. (On my Linux laptop)
>>>
>>> regards,
>>> Ch
>>>
>>> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I can reproduce the problem with the following script.
>>>> I got rows which should be truncated.
>>>> If truncating is executed only once, the problem doesn't occur.
>>>>
>>>> The test for multi nodes (replication_factor:3, kill & restart C*
>>>> processes in all nodes) can also reproduce it.
>>>>
>>>> test script:
>>>> ----
>>>>
>>>> ip=xxx.xxx.xxx.xxx
>>>>
>>>> echo "0. prepare a table"
>>>> cqlsh $ip -e "drop keyspace testdb;"
>>>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
>>>> 'SimpleStrategy', 'replication_factor': '1'};"
>>>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val
>>>> int);"
>>>>
>>>> echo "1. insert rows"
>>>> for key in $(seq 1 10)
>>>> do
>>>>     cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key,
>>>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1
>>>> done
>>>>
>>>> echo "2. truncate the table twice"
>>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>>>
>>>> echo "3. kill C* process"
>>>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print
>>>> $2}' | xargs sudo kill -9
>>>>
>>>> echo "4. restart C* process"
>>>> sudo /etc/init.d/cassandra start
>>>> sleep 20
>>>>
>>>> echo "5. check the table"
>>>> cqlsh $ip -e "select * from testdb.testtbl;"
>>>>
>>>> ----
>>>>
>>>> test result:
>>>> ----
>>>>
>>>> 0. prepare a table
>>>> 1. insert rows
>>>> 2. truncate the table twice
>>>> Consistency level set to ALL.
>>>> Consistency level set to ALL.
>>>> 3. kill C* process
>>>> 4. restart C* process
>>>> Starting Cassandra: OK
>>>> 5. check the table
>>>>
>>>>  key | val
>>>> -----+------
>>>>    5 | 1000
>>>>   10 | 1000
>>>>    1 | 1000
>>>>    8 | 1000
>>>>    2 | 1000
>>>>    4 | 1000
>>>>    7 | 1000
>>>>    6 | 1000
>>>>    9 | 1000
>>>>    3 | 1000
>>>>
>>>> (10 rows)
>>>>
>>>> ----
>>>>
>>>>
>>>> Thanks Christian,
>>>>
>>>> I tried with durable_writes=False.
>>>> It failed. I guessed this failure was caused by another problem.
>>>> I use SimpleStrategy.
>>>> A keyspace using the SimpleStrategy isn't permitted to use
>>>> durable_writes=False.
>>>>
>>>>
>>>> Regards,
>>>> Yuji
>>>>
>>>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>>>>
>>>>> Hi Yuji,
>>>>>
>>>>> ok, perhaps you are seeing a different issue than I do.
>>>>>
>>>>> Have you tried with durable_writes=False? If the issue is caused by
>>>>> the commitlog, then it should work if you disable durable_writes.
>>>>>
>>>>> Cheers,
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>>>
>>>>>> Thanks Christian
>>>>>>
>>>>>> can you reproduce the behaviour with a single node?
>>>>>>
>>>>>> I tried my test with a single node. But I can't.
>>>>>>
>>>>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>>>>> with CQL. I did not experience this with Thrift.
>>>>>>
>>>>>> I truncate tables with CQL. I've never tried with Thrift.
>>>>>>
>>>>>> I think that my problem can happen when truncating even succeeds.
>>>>>> That's because I check all records after truncating.
>>>>>>
>>>>>> I checked the source code.
>>>>>> ReplayPosition.segment and position become -1 and 0
>>>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when 
>>>>>> there
>>>>>> is no SSTable.
>>>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>>>>> table in this case.
>>>>>> replayMutation() can request unexpected replay mutations because of
>>>>>> this segment's value.
>>>>>>
>>>>>> Is there anyone familiar with truncate and replay?
>>>>>>
>>>>>> Regards,
>>>>>> Yuji
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Yuji,
>>>>>>>
>>>>>>> can you reproduce the behaviour with a single node?
>>>>>>>
>>>>>>> The reason I ask is because I probably have the same issue with my
>>>>>>> automated tests (which run truncate between every test), which run on my
>>>>>>> local laptop.
>>>>>>>
>>>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that
>>>>>>> the failed tests sometimes show data from other tests, which I think 
>>>>>>> must
>>>>>>> be because of a failed truncate. This behaviour is seems to be CQL 
>>>>>>> only, or
>>>>>>> at least has gotten worse with CQL. I did not experience this with 
>>>>>>> Thrift.
>>>>>>>
>>>>>>> regards,
>>>>>>> Christian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a question about clearing table and commit log replay.
>>>>>>>> After some tables were truncated consecutively, I got some stale
>>>>>>>> values.
>>>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and
>>>>>>>> CREATE).
>>>>>>>>
>>>>>>>> I'm testing the following test with node failure.
>>>>>>>> Some stale values appear at checking phase.
>>>>>>>>
>>>>>>>> Test iteration:
>>>>>>>> 1. initialize tables as below
>>>>>>>> 2. request a lot of read/write concurrently
>>>>>>>> 3. check all records
>>>>>>>> 4. repeat from the beginning
>>>>>>>>
>>>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
>>>>>>>> Each node kills cassandra process at random intervals and restarts
>>>>>>>> it immediately.
>>>>>>>>
>>>>>>>> My initialization:
>>>>>>>> 1. clear tables with TRUNCATE
>>>>>>>> 2. INSERT initial records
>>>>>>>> 3. check if all values are correct
>>>>>>>>
>>>>>>>> If any phase fails (because of node failure), the initialization
>>>>>>>> starts all over again.
>>>>>>>> So, tables are sometimes truncated consecutively.
>>>>>>>> Though the check in the initialization is OK, stale data appears
>>>>>>>> when I execute "SELECT * FROM mykeyspace.mytable;" after a lot of 
>>>>>>>> requests
>>>>>>>> are completed.
>>>>>>>>
>>>>>>>> The problem is likely to occur when the ReplayPosition's value in
>>>>>>>> "truncated_at" is initialized as below after an empty table is 
>>>>>>>> truncated.
>>>>>>>>
>>>>>>>> Column Family ID: truncated_at
>>>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:
>>>>>>>> 0xffffffffffffffff0000000000000156597cd4c7
>>>>>>>> (this value was acquired just after phase 1 in my initialization)
>>>>>>>>
>>>>>>>> I guess some unexpected replays occur.
>>>>>>>> Does anyone know the behavior?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Yuji
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stale value appears after consecutive TRUNCATE

Reply via email to