Re: [Gluster-devel] GlusterFS Snapshot internals

Rajesh Joseph Sun, 06 Apr 2014 23:11:22 -0700

Hi Krish,

The list of fops mentioned in snapshot design page and the barrier feature page 
looks same to me.
The snapshot design page provides an overview of barrier, details is already 
provided in your feature page.
The snapshot design page had reference to barrier feature page, somehow it is 
not showing up in reference section. Let me fix that.


Lets have a discussion to analyze the gaps.

Best Regards,
Rajesh

----- Original Message -----
From: "Krishnan Parthasarathi" <kpart...@redhat.com>
To: "Paul Cuzner" <pcuz...@redhat.com>
Cc: "Rajesh Joseph" <rjos...@redhat.com>, "gluster-devel" 
<gluster-devel@nongnu.org>
Sent: Monday, April 7, 2014 8:14:16 AM
Subject: Re: [Gluster-devel] GlusterFS Snapshot internals

The section on barrier is not current. The list of FOPs to quiesced is also 
incorrect. I will work with Rajesh to bring it up to date. 

To understand how barrier ties with volume snapshots, see: 
http://www.gluster.org/community/documentation/index.php/Features/Server-side_Barrier_feature#Abstract

Thanks,
Krish

----- Paul Cuzner <pcuz...@redhat.com> wrote:
> Hi Rajesh, 
> 
> Thanks for updating the design doc. It reads well. 
> 
> I have a number of questions that would help my understanding; 
> 
> Logging : The doc doesn't mention how the snapshot process is logged - 
> - will snapshot use an existing log or a new log? 
> - Will the log be specific to a volume, or will all snapshot activity be 
> logged in a single file? 
> - will the log be visible on all nodes, or just the originating node? 
> - will the highlevel snapshot action be visible when looking from the other 
> nodes either in the logs or at the cli? 
> 
> Restore : You mention that after a restore operation, the snapshot will be 
> automatically deleted. 
> - I don't believe this is a prudent thing to do. Here's an example, I've seen 
> ALOT. Application has a programmatic error, leading to data 'corruption' - 
> devs work on the program, storage guys roll the volume back. So far so 
> good...devs provide the updated program, and away you go...BUT the issue is 
> not resolved, so you need to roll back again to the same point in time. If 
> you delete the snap automatically, you loose the restore point. Yes the admin 
> could take another snap after the restore - but why add more work into a 
> recovery process where people are already stressed out :) I'd recommend 
> leaving the snapshot if possible, and let it age out naturally. 
> 
> Auto-delete : Is this a post phase of the snapshot create, so the 
> successfully creation of a new snapshot will trigger the pruning of old 
> versions? 
> 
> Snapshot Naming : The doc states the name is mandatory. 
> - why not offer a default - volume_name_timestamp - instead of making the 
> caller decide on a name. Having this as a default will also make the list 
> under .snap more usable by default. 
> - providing a sensible default will make it easier for end users for self 
> service restore. More sensible defaults = more happy admins :) 
> 
> Quorum and snaprestore : the doc mentions that when a returning brick comes 
> back, it will be snap'd before pending changes are applied. If I understand 
> the use of quorum correctly, can you comment on the following scenario; 
> - With a brick offline, we'll be tracking changes. Say after 1hr a snap is 
> invoked because quorum is met 
> - changes continue on the volume for another 15 minutes beyond the snap, when 
> the offline brick comes back online. 
> - at this point there are two point in times to bring the brick back to - the 
> brick needs the changes up to the point of the snap, then a snap of the brick 
> followed by the 'replay' of the additional changes to get back to the same 
> point in time as the other replica's in the replica set. 
> - of course, the brick could be offline for 24 or 48 hours due to a hardware 
> fault - during which time multiple snapshots could have been made 
> - it wasn't clear to me how this scenario is dealt with from the doc? 
> 
> barrier : two things are mentioned here - a buffer size and a timeout value. 
> - from an admin's pespective, being able to specify the timeout (secs) is 
> likely to be more workable - and will allow them to align this setting with 
> any potential timeout setting within the application running against the 
> gluster volume. I don't think most admins will know or want to know how to 
> size the buffer properly. 
> 
> Hopefully the above makes sense. 
> 
> Cheers, 
> 
> Paul C 
> 
> ----- Original Message -----
> 
> > From: "Rajesh Joseph" <rjos...@redhat.com>
> > To: "gluster-devel" <gluster-devel@nongnu.org>
> > Sent: Wednesday, 2 April, 2014 3:55:28 AM
> > Subject: [Gluster-devel] GlusterFS Snapshot internals
> 
> > Hi all,
> 
> > I have updated the GlusterFS snapshot forge wiki.
> 
> > https://forge.gluster.org/snapshot/pages/Home
> 
> > Please go through it and let me know if you have any questions or queries.
> 
> > Best Regards,
> > Rajesh
> 
> > [PS]: Please ignore previous mail. Accidentally hit send before completing 
> > :)
> 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@nongnu.org
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel


_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS Snapshot internals

Reply via email to