RE: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-12 Thread Manu Chadha
Hi Jeff

By “dropping” the periodic time, do you mean making it 0 or commenting commit 
log or changing commit log to batch? Referring to the comments in 
Cassandra.yaml, if I use commitlog_sync as batch with 2ms window, does it mean 
that when a write is done on the db, then it is immediately flushed to disk so 
when a snapshot would be taken the disk should have all the data except for any 
new writes which might come within the 2 ms window? I suppose that DB would be 
much slower now compared to 10s periodic window.

# commitlog_sync may be either "periodic" or "batch."
#
# When in batch mode, Cassandra won't ack writes until the commit log
# has been fsynced to disk.  It will wait
# commitlog_sync_batch_window_in_ms milliseconds between fsyncs.
# This window should be kept short because the writer threads will
# be unable to do extra work while waiting.  (You may need to increase
# concurrent_writes for the same reason.)
#
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2

thanks
Sent from Mail for Windows 10

From: Jeff Jirsa
Sent: 11 November 2020 00:24
To: cassandra
Subject: Re: Cassandra in a container - what to do (sequence of events) to 
snapshot the storage volume?

The commitlog defaults to periodic mode, which writes a sync marker to the file 
and fsync's the data to disk every 10s by default.

`nodetool flush` will force a sync marker / fsync

Data written since the last fsync will not be replayed on startup and will be 
lost.

If you drop the periodic time, the number of writes you lose on restart 
decreases.

Alternatively, you can switch to group/batch commitlog, and it goes to zero, 
but you'll fsync far more frequently.



On Tue, Nov 10, 2020 at 4:19 PM Florin Andrei 
mailto:flo...@andrei.myip.org>> wrote:
That sounds great! Now here's my question:

I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
under heavy read/write traffic, with lots of writes per second. What's
the worst that could happen, lose a few writes?


On 2020-11-10 15:59, Jeff Jirsa wrote:
> If you want all of the instances to be consistent with each other,
> this is much harder, but if you only want a container that can stop
> and resume, you don't have to do anything more than flush + snapshot
> the storage. The data files on cassandra should ALWAYS be in a state
> where the database will restart, because they have to be to tolerate
> power outage.
>
> On Tue, Nov 10, 2020 at 3:39 PM Florin Andrei 
> mailto:flo...@andrei.myip.org>>
> wrote:
>
>> Running Apache Cassandra 3 in Docker. I need to snapshot the storage
>>
>> volumes. Obviously, I want to be able to re-launch Cassandra from
>> the
>> snapshots later on. So the snapshots need to be in a consistent
>> state.
>>
>> With most DBs, the sequence of events is this:
>>
>> - flush the DB to disk
>> - "freeze" the DB
>> - snapshot the storage
>> - "unfreeze" the DB
>>
>> What does that sequence translate to, in Cassandra parlance?
>>
>> What is the sequence of events that needs to happen when I bring the
>> DB
>> up from an old snapshot? Will there be a restore procedure, or can I
>>
>> just start it as usual?
>>
>> --
>> Florin Andrei
>> https://florin.myip.org/
>>
>>
> -
>> To unsubscribe, e-mail: 
>> user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: 
>> user-h...@cassandra.apache.org

--
Florin Andrei
https://florin.myip.org/

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



Re: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-10 Thread Jeff Jirsa
The commitlog defaults to periodic mode, which writes a sync marker to the
file and fsync's the data to disk every 10s by default.

`nodetool flush` will force a sync marker / fsync

Data written since the last fsync will not be replayed on startup and will
be lost.

If you drop the periodic time, the number of writes you lose on restart
decreases.

Alternatively, you can switch to group/batch commitlog, and it goes to
zero, but you'll fsync far more frequently.



On Tue, Nov 10, 2020 at 4:19 PM Florin Andrei 
wrote:

> That sounds great! Now here's my question:
>
> I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
> under heavy read/write traffic, with lots of writes per second. What's
> the worst that could happen, lose a few writes?
>
>
> On 2020-11-10 15:59, Jeff Jirsa wrote:
> > If you want all of the instances to be consistent with each other,
> > this is much harder, but if you only want a container that can stop
> > and resume, you don't have to do anything more than flush + snapshot
> > the storage. The data files on cassandra should ALWAYS be in a state
> > where the database will restart, because they have to be to tolerate
> > power outage.
> >
> > On Tue, Nov 10, 2020 at 3:39 PM Florin Andrei 
> > wrote:
> >
> >> Running Apache Cassandra 3 in Docker. I need to snapshot the storage
> >>
> >> volumes. Obviously, I want to be able to re-launch Cassandra from
> >> the
> >> snapshots later on. So the snapshots need to be in a consistent
> >> state.
> >>
> >> With most DBs, the sequence of events is this:
> >>
> >> - flush the DB to disk
> >> - "freeze" the DB
> >> - snapshot the storage
> >> - "unfreeze" the DB
> >>
> >> What does that sequence translate to, in Cassandra parlance?
> >>
> >> What is the sequence of events that needs to happen when I bring the
> >> DB
> >> up from an old snapshot? Will there be a restore procedure, or can I
> >>
> >> just start it as usual?
> >>
> >> --
> >> Florin Andrei
> >> https://florin.myip.org/
> >>
> >>
> > -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> --
> Florin Andrei
> https://florin.myip.org/
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-10 Thread Erick Ramirez
>
> I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
> under heavy read/write traffic, with lots of writes per second. What's
> the worst that could happen, lose a few writes?
>

Nope, you won't lose anything. Snapshots in C* are the equivalent of a cold
backup in relational DBs with the key difference that it all takes place
while the nodes and cluster remain online and operational. Cheers!


Re: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-10 Thread Florin Andrei

That sounds great! Now here's my question:

I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is 
under heavy read/write traffic, with lots of writes per second. What's 
the worst that could happen, lose a few writes?



On 2020-11-10 15:59, Jeff Jirsa wrote:

If you want all of the instances to be consistent with each other,
this is much harder, but if you only want a container that can stop
and resume, you don't have to do anything more than flush + snapshot
the storage. The data files on cassandra should ALWAYS be in a state
where the database will restart, because they have to be to tolerate
power outage.

On Tue, Nov 10, 2020 at 3:39 PM Florin Andrei 
wrote:


Running Apache Cassandra 3 in Docker. I need to snapshot the storage

volumes. Obviously, I want to be able to re-launch Cassandra from
the
snapshots later on. So the snapshots need to be in a consistent
state.

With most DBs, the sequence of events is this:

- flush the DB to disk
- "freeze" the DB
- snapshot the storage
- "unfreeze" the DB

What does that sequence translate to, in Cassandra parlance?

What is the sequence of events that needs to happen when I bring the
DB
up from an old snapshot? Will there be a restore procedure, or can I

just start it as usual?

--
Florin Andrei
https://florin.myip.org/



-

To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


--
Florin Andrei
https://florin.myip.org/

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-10 Thread Jeff Jirsa
If you want all of the instances to be consistent with each other, this is
much harder, but if you only want a container that can stop and resume, you
don't have to do anything more than flush + snapshot the storage. The data
files on cassandra should ALWAYS be in a state where the database will
restart, because they have to be to tolerate power outage.



On Tue, Nov 10, 2020 at 3:39 PM Florin Andrei 
wrote:

> Running Apache Cassandra 3 in Docker. I need to snapshot the storage
> volumes. Obviously, I want to be able to re-launch Cassandra from the
> snapshots later on. So the snapshots need to be in a consistent state.
>
> With most DBs, the sequence of events is this:
>
> - flush the DB to disk
> - "freeze" the DB
> - snapshot the storage
> - "unfreeze" the DB
>
> What does that sequence translate to, in Cassandra parlance?
>
> What is the sequence of events that needs to happen when I bring the DB
> up from an old snapshot? Will there be a restore procedure, or can I
> just start it as usual?
>
> --
> Florin Andrei
> https://florin.myip.org/
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-10 Thread Florin Andrei
Running Apache Cassandra 3 in Docker. I need to snapshot the storage 
volumes. Obviously, I want to be able to re-launch Cassandra from the 
snapshots later on. So the snapshots need to be in a consistent state.


With most DBs, the sequence of events is this:

- flush the DB to disk
- "freeze" the DB
- snapshot the storage
- "unfreeze" the DB

What does that sequence translate to, in Cassandra parlance?

What is the sequence of events that needs to happen when I bring the DB 
up from an old snapshot? Will there be a restore procedure, or can I 
just start it as usual?


--
Florin Andrei
https://florin.myip.org/

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org