> Brutally. kill -9. that's fine. I was thinking about reboot -f -n > We are wondering if the fsync of the commit log was working. I would say yes only because there other reported problems.
I think case I would not expect to see data lose. If you are still in a test scenario can you try to reproduce the problem ? If possible can you reproduce it with a single node ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/08/2012, at 11:00 AM, rubbish me <rubbish...@googlemail.com> wrote: > Thanks, Aaron, for your reply - please see the inline. > > > On 24 Aug 2012, at 11:04, aaron morton wrote: > >>> - we are running on production linux VMs (not ideal but this is out of our >>> hands) >> Is the VM doing anything wacky with the IO ? > > Could be. But I thought we would ask here first. This is a bit difficult to > prove cos we dont have the control over these VMs. > >> >> >>> As part of a DR exercise, we killed all 6 nodes in DC1, >> Nice disaster. Out of interest, what was the shutdown process ? > > Brutally. kill -9. > > >> >>> We noticed that data that was written an hour before the exercise, around >>> the last memtables being flushed,was not found in DC1. >> To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR >> exercise. >> >> Was the missing data written before or after the memtable flush ? I'm trying >> to understand if the data should have been in the commit log or the >> memtables. > > Missing data was those written after the last flush. These data was > retrievable before the DR exercise. > >> >> Can you provide some more info on how you are detecting it is not found in >> DC 1? >> > > We tried hector, consistencylevel=local quorum. We had missing column or the > whole row. > > We tried cassandra-cli on DC1 nodes, same. > > However once we run the same query on DC2, C* must have then done a > read-repair. That particular piece of result data would appear in DC1 again. > > >>> If we understand correctly, commit logs are being written first and then to >>> disk every 10s. >> Writes are put into a bounded queue and processed as fast as the IO can keep >> up. Every 10s a sync messages is added to the queue. Not that the commit log >> segment may rotate at any time which requires a sync. >> >> A loss of data across all nodes in a DC seems odd. If you can provide some >> more information we may be able to help. > > > We are wondering if the fsync of the commit log was working. But we saw no > errors / warning in logs. Wondering if there is way to verify.... > > >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 24/08/2012, at 6:01 AM, rubbish me <rubbish...@googlemail.com> wrote: >> >>> Hi all >>> >>> First off, let's introduce the setup. >>> >>> - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2) >>> - keyspace's RF=3 in each DC >>> - Hector as client. >>> - client talks only to DC1 unless DC1 can't serve the request. In which >>> case talks only to DC2 >>> - commit log was periodically sync with the default setting of 10s. >>> - consistency policy = LOCAL QUORUM for both read and write. >>> - we are running on production linux VMs (not ideal but this is out of our >>> hands) >>> ----- >>> As part of a DR exercise, we killed all 6 nodes in DC1, hector starts >>> talking to DC2, all the data was still there, everything continued to work >>> perfectly. >>> >>> Then we brought all nodes, one by one, in DC1 up. We saw a message saying >>> all the commit logs were replayed. No errors reported. We didn't run >>> repair at this time. >>> >>> We noticed that data that was written an hour before the exercise, around >>> the last memtables being flushed,was not found in DC1. >>> >>> If we understand correctly, commit logs are being written first and then to >>> disk every 10s. At worst we lost the last 10s of data. What could be the >>> cause of this behaviour? >>> >>> With the blessing of C* we could recovered all these data from DC2. But we >>> would like to understand why. >>> >>> Many thanks in advanced. >>> >>> Amy >>> >>> >> >