Re: Stale value appears after consecutive TRUNCATE

horschi Thu, 25 Aug 2016 02:12:07 -0700

(running C* 2.2.7)

On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote:


> Hi Yuji,
>
> I tried your script a couple of times. I did not experience any stale
> values. (On my Linux laptop)
>
> regards,
> Ch
>
> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>
>> Hi,
>>
>> I can reproduce the problem with the following script.
>> I got rows which should be truncated.
>> If truncating is executed only once, the problem doesn't occur.
>>
>> The test for multi nodes (replication_factor:3, kill & restart C*
>> processes in all nodes) can also reproduce it.
>>
>> test script:
>> ----
>>
>> ip=xxx.xxx.xxx.xxx
>>
>> echo "0. prepare a table"
>> cqlsh $ip -e "drop keyspace testdb;"
>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
>> 'SimpleStrategy', 'replication_factor': '1'};"
>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"
>>
>> echo "1. insert rows"
>> for key in $(seq 1 10)
>> do
>>     cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key,
>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1
>> done
>>
>> echo "2. truncate the table twice"
>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>
>> echo "3. kill C* process"
>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}'
>> | xargs sudo kill -9
>>
>> echo "4. restart C* process"
>> sudo /etc/init.d/cassandra start
>> sleep 20
>>
>> echo "5. check the table"
>> cqlsh $ip -e "select * from testdb.testtbl;"
>>
>> ----
>>
>> test result:
>> ----
>>
>> 0. prepare a table
>> 1. insert rows
>> 2. truncate the table twice
>> Consistency level set to ALL.
>> Consistency level set to ALL.
>> 3. kill C* process
>> 4. restart C* process
>> Starting Cassandra: OK
>> 5. check the table
>>
>>  key | val
>> -----+------
>>    5 | 1000
>>   10 | 1000
>>    1 | 1000
>>    8 | 1000
>>    2 | 1000
>>    4 | 1000
>>    7 | 1000
>>    6 | 1000
>>    9 | 1000
>>    3 | 1000
>>
>> (10 rows)
>>
>> ----
>>
>>
>> Thanks Christian,
>>
>> I tried with durable_writes=False.
>> It failed. I guessed this failure was caused by another problem.
>> I use SimpleStrategy.
>> A keyspace using the SimpleStrategy isn't permitted to use
>> durable_writes=False.
>>
>>
>> Regards,
>> Yuji
>>
>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>>
>>> Hi Yuji,
>>>
>>> ok, perhaps you are seeing a different issue than I do.
>>>
>>> Have you tried with durable_writes=False? If the issue is caused by the
>>> commitlog, then it should work if you disable durable_writes.
>>>
>>> Cheers,
>>> Christian
>>>
>>>
>>>
>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>>> Thanks Christian
>>>>
>>>> can you reproduce the behaviour with a single node?
>>>>
>>>> I tried my test with a single node. But I can't.
>>>>
>>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>>> with CQL. I did not experience this with Thrift.
>>>>
>>>> I truncate tables with CQL. I've never tried with Thrift.
>>>>
>>>> I think that my problem can happen when truncating even succeeds.
>>>> That's because I check all records after truncating.
>>>>
>>>> I checked the source code.
>>>> ReplayPosition.segment and position become -1 and 0
>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there
>>>> is no SSTable.
>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>>> table in this case.
>>>> replayMutation() can request unexpected replay mutations because of
>>>> this segment's value.
>>>>
>>>> Is there anyone familiar with truncate and replay?
>>>>
>>>> Regards,
>>>> Yuji
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>>>>
>>>>> Hi Yuji,
>>>>>
>>>>> can you reproduce the behaviour with a single node?
>>>>>
>>>>> The reason I ask is because I probably have the same issue with my
>>>>> automated tests (which run truncate between every test), which run on my
>>>>> local laptop.
>>>>>
>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the
>>>>> failed tests sometimes show data from other tests, which I think must be
>>>>> because of a failed truncate. This behaviour is seems to be CQL only, or 
>>>>> at
>>>>> least has gotten worse with CQL. I did not experience this with Thrift.
>>>>>
>>>>> regards,
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a question about clearing table and commit log replay.
>>>>>> After some tables were truncated consecutively, I got some stale
>>>>>> values.
>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and
>>>>>> CREATE).
>>>>>>
>>>>>> I'm testing the following test with node failure.
>>>>>> Some stale values appear at checking phase.
>>>>>>
>>>>>> Test iteration:
>>>>>> 1. initialize tables as below
>>>>>> 2. request a lot of read/write concurrently
>>>>>> 3. check all records
>>>>>> 4. repeat from the beginning
>>>>>>
>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
>>>>>> Each node kills cassandra process at random intervals and restarts it
>>>>>> immediately.
>>>>>>
>>>>>> My initialization:
>>>>>> 1. clear tables with TRUNCATE
>>>>>> 2. INSERT initial records
>>>>>> 3. check if all values are correct
>>>>>>
>>>>>> If any phase fails (because of node failure), the initialization
>>>>>> starts all over again.
>>>>>> So, tables are sometimes truncated consecutively.
>>>>>> Though the check in the initialization is OK, stale data appears when
>>>>>> I execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are
>>>>>> completed.
>>>>>>
>>>>>> The problem is likely to occur when the ReplayPosition's value in
>>>>>> "truncated_at" is initialized as below after an empty table is truncated.
>>>>>>
>>>>>> Column Family ID: truncated_at
>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000
>>>>>> 0156597cd4c7
>>>>>> (this value was acquired just after phase 1 in my initialization)
>>>>>>
>>>>>> I guess some unexpected replays occur.
>>>>>> Does anyone know the behavior?
>>>>>>
>>>>>> Thanks,
>>>>>> Yuji
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stale value appears after consecutive TRUNCATE

Reply via email to