I'm working on a project that uses Amazon AWS-provided VPS instances,
and the other guy on the project is telling me that "snapshotting hourly may 
degrade performance",
and I'm trying to determine where that's actually true. My gut feeling is that 
it sounds kind of bogus.

>From the information I've been able to find about how Amazon's stuff works 
>(either in terms
of how it's _implemented_ [for which I'm finding basically no insight] or how 
it's _characterized_
[in the engineering sense, not the literary sense]...), it really sounds a 
_lot_ like Amazon
is just using LVM snapshots, e.g. from <https://aws.amazon.com/ebs/faqs/>:

        "snapshots can be done in real time while the volume is attached and in 
use.
         However, snapshots only capture data that has been written to your 
Amazon EBS volume,
         which might exclude any data that has been locally cached by your 
application or OS."

        "By design, an EBS Snapshot of an entire 16 TB volume should take no 
longer than the time
         it takes to snapshot an entire 1 TB volume. However, the actual time 
taken to create
         a snapshot depends on several factors including the amount of data 
that has changed
         since the last snapshot of the EBS volume."

... though I'm not entirely sure how to interpret that last bit about "time 
taken to create a snapshot
depends on... the amount of data that has changed since the last snapshot";
the _first half of that statement_ reads as "creating a snapshot is constant 
time",
which basically screams to me "copy-on-write just like LVM, and they're 
probably implemented
in terms of LVM".

Any insight here as to whether my gut is correct on this, or whether I'm 
actually likely
to notice an impact from hourly snapshots of, say, a 200-GB volume? How about a 
1-TB volume?

The only thing I'm seeing from Amazon that seems to _vaguely_ support (maybe) 
the notion
that `snapshotting too often' would be something to worry about is this bit 
from elsewhere
in that same FAQ page (under the heading of "performance", whereas the others 
were
under the heading of "snapshots" and a subheading of "performance consistency 
of my HDD-backed volumes":

        Another factor is taking a snapshot which will decrease expected write 
performance
        down to the baseline rate, until the snapshot completes.

... and, taken in the context of the previously-cited notes about snapshots 
being
`not base on volume-size but maybe influenced by changed-since-last-snapshot 
set size'
(and in the context of the explanations they give for HDD-backed vs. SSD-backed 
storage),
I'm basically reading that as:

        `if you're using HDD-backed storage then it's because you care about 
*throughput*
         more than *response time* and are likely to be monitoring throughput,
         and if you're monitoring throughput you may notice a *momentary dip in 
throughput*
         as the *HDDs* need to seek around to find the volume boundaries and 
set up the COW records.'

Even if you don't have any insight into what's actually happening under the 
covers at Amazon,
does my reading of all of this sound right to you?

And, perhaps more interestingly, are these same caveats from Amazon generally 
applicable to LVM?

-- 
Connect with me on GNU social network: <https://status.hackerposse.com/rozzin>
Not on the network? Ask me for an invitation to the nhcrossing.com social hub
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Reply via email to