I would say it's unlikely to be LVM, because LVM is content-ignorant; it snapshots the entire volume, which is inefficient, and when you're Amazon, you care a LOT about being efficient. Instead, I imagine they're using some content-aware CoW solution such as ZFS. But, whatever mechanism, I agree with your opinion: I doubt that their solution -- almost certainly CoW of some sort -- stands a chance of being more than even slightly impactful.
$.02, YMMV and other assorted disclaimers, -Ken On 2017-09-28 13:16, Joshua Judson Rosen wrote: > I'm working on a project that uses Amazon AWS-provided VPS instances, > and the other guy on the project is telling me that "snapshotting > hourly may degrade performance", > and I'm trying to determine where that's actually true. My gut feeling > is that it sounds kind of bogus. > >> From the information I've been able to find about how Amazon's stuff >> works (either in terms > of how it's _implemented_ [for which I'm finding basically no insight] > or how it's _characterized_ > [in the engineering sense, not the literary sense]...), it really > sounds a _lot_ like Amazon > is just using LVM snapshots, e.g. from > <https://aws.amazon.com/ebs/faqs/>: > > "snapshots can be done in real time while the volume is attached and > in use. > However, snapshots only capture data that has been written to your > Amazon EBS volume, > which might exclude any data that has been locally cached by your > application or OS." > > "By design, an EBS Snapshot of an entire 16 TB volume should take no > longer than the time > it takes to snapshot an entire 1 TB volume. However, the actual time > taken to create > a snapshot depends on several factors including the amount of data > that has changed > since the last snapshot of the EBS volume." > > ... though I'm not entirely sure how to interpret that last bit about > "time taken to create a snapshot > depends on... the amount of data that has changed since the last > snapshot"; > the _first half of that statement_ reads as "creating a snapshot is > constant time", > which basically screams to me "copy-on-write just like LVM, and > they're probably implemented > in terms of LVM". > > Any insight here as to whether my gut is correct on this, or whether > I'm actually likely > to notice an impact from hourly snapshots of, say, a 200-GB volume? > How about a 1-TB volume? > > The only thing I'm seeing from Amazon that seems to _vaguely_ support > (maybe) the notion > that `snapshotting too often' would be something to worry about is > this bit from elsewhere > in that same FAQ page (under the heading of "performance", whereas the > others were > under the heading of "snapshots" and a subheading of "performance > consistency of my HDD-backed volumes": > > Another factor is taking a snapshot which will decrease expected > write performance > down to the baseline rate, until the snapshot completes. > > ... and, taken in the context of the previously-cited notes about > snapshots being > `not base on volume-size but maybe influenced by > changed-since-last-snapshot set size' > (and in the context of the explanations they give for HDD-backed vs. > SSD-backed storage), > I'm basically reading that as: > > `if you're using HDD-backed storage then it's because you care about > *throughput* > more than *response time* and are likely to be monitoring throughput, > and if you're monitoring throughput you may notice a *momentary dip > in throughput* > as the *HDDs* need to seek around to find the volume boundaries and > set up the COW records.' > > Even if you don't have any insight into what's actually happening > under the covers at Amazon, > does my reading of all of this sound right to you? > > And, perhaps more interestingly, are these same caveats from Amazon > generally applicable to LVM? _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/