Hi Xavi, On Tue, Jun 18, 2019 at 12:28 PM Xavi Hernandez <[email protected]> wrote:
> Hi Kotresh, > > On Tue, Jun 18, 2019 at 8:33 AM Kotresh Hiremath Ravishankar < > [email protected]> wrote: > >> Hi Xavi, >> >> Reply inline. >> >> On Mon, Jun 17, 2019 at 5:38 PM Xavi Hernandez <[email protected]> >> wrote: >> >>> Hi Kotresh, >>> >>> On Mon, Jun 17, 2019 at 1:50 PM Kotresh Hiremath Ravishankar < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> The ctime feature is enabled by default from release gluster-6. But as >>>> explained in bug [1] there is a known issue with legacy files i.e., the >>>> files which are created before ctime feature is enabled. These files would >>>> not have "trusted.glusterfs.mdata" xattr which maintain time attributes. So >>>> on, accessing those files, it gets created with latest time attributes. >>>> This is not correct because all the time attributes (atime, mtime, ctime) >>>> get updated instead of required time attributes. >>>> >>>> There are couple of approaches to solve this. >>>> >>>> 1. On accessing the files, let the posix update the time attributes >>>> from the back end file on respective replicas. This obviously results in >>>> inconsistent "trusted.glusterfs.mdata" xattr values with in replica set. >>>> AFR/EC should heal this xattr as part of metadata heal upon accessing this >>>> file. It can chose to replicate from any subvolume. Ideally we should >>>> consider the highest time from the replica and treat it as source but I >>>> think that should be fine as replica time attributes are mostly in sync >>>> with max difference in order of few seconds if am not wrong. >>>> >>>> But client side self heal is disabled by default because of >>>> performance reasons [2]. If we chose to go by this approach, we need to >>>> consider enabling at least client side metadata self heal by default. >>>> Please share your thoughts on enabling the same by default. >>>> >>>> 2. Don't let posix update the legacy files from the backend. On lookup >>>> cbk, let the utime xlator update the time attributes from statbuf received >>>> synchronously. >>>> >>>> Both approaches are similar as both results in updating the xattr >>>> during lookup. Please share your inputs on which approach is better. >>>> >>> >>> I prefer second approach. First approach is not feasible for EC volumes >>> because self-heal requires that k bricks (on a k+r configuration) agree on >>> the value of this xattr, otherwise it considers the metadata damaged and >>> needs manual intervention to fix it. During upgrade, first r bricks with be >>> upgraded without problems, but trusted.glusterfs.mdata won't be healed >>> because r < k. In fact this xattr will be removed from new bricks because >>> the majority of bricks agree on xattr not being present. Once the r+1 brick >>> is upgraded, it's possible that posix sets different values for >>> trusted.glusterfs.mdata, which will cause self-heal to fail. >>> >>> Second approach seems better to me if guarded by a new option that >>> enables this behavior. utime xlator should only update the mdata xattr if >>> that option is set, and that option should only be settable once all nodes >>> have been upgraded (controlled by op-version). In this situation the first >>> lookup on a file where utime detects that mdata is not set, will require a >>> synchronous update. I think this is good enough because it will only happen >>> once per file. We'll need to consider cases where different clients do >>> lookups at the same time, but I think this can be easily solved by ignoring >>> the request if mdata is already present. >>> >> >> Initially there were two issues. >> 1. Upgrade Issue with EC Volume as described by you. >> This is solved with the patch [1]. There was a bug in ctime >> posix where it was creating xattr even when ctime is not set on client >> (during utimes system call). With patch [1], the behavior >> is that utimes system call will only update the >> "trusted.glusterfs.mdata" xattr if present else it won't create. The new >> xattr creation should only happen during entry operations (i.e create, >> mknod and others). >> So there won't be any problems with upgrade. I think we don't need new >> option dependent on op version if I am not wrong. >> > > If I'm not missing something, we cannot allow creation of mdata xattr even > for create/mknod/setattr fops. Doing so could cause the same problem if > some of the bricks are not upgraded and do not support mdata yet (or they > have ctime disabled by default). > Yes, that's right, even create/mknod and other fops won't create xattr if client doesn't set ctime (holds good for older clients). I have commented in the patch [1]. All other fops where xattr gets created as the check that if ctime is not set, don't create. It was missed only in utime syscall. And hence caused upgrade issues. > > >> 2. After upgrade, how do we update "trusted.glusterfs.mdata" xattr. >> This mail thread was for this. Here which approach is better? I >> understand from EC point of view the second approach is the best one. The >> question I had was, Can't EC treat 'trusted.glusterfs.mdata' >> as special xattr and add the logic to heal it from one subvolume >> (i.e. to remove the requirement of having to have consistent data on k >> subvolumes in k+r configuration). >> > > Yes, we can do that. But this would require a newer client with support > for this new xattr, which won't be possible during an upgrade, where bricks > are upgraded before the clients. So, even if we add this intelligence to > the client, the upgrade process is still broken. Only consideration here is > if we can rely on self-heal daemon being on the server side (and thus > upgraded at the same time than the server) to ensure that files can really > be healed even if other bricks/shd daemons are not yet updated. Not sure if > it could work, but anyway I don't like it very much. > > >> >> Second approach is independent of AFR and EC. So if we chose >> this, do we need new option to guard? If the upgrade steps is to upgrade >> server first and then client, we don't need to guard I think? >> > > I think you are right for regular clients. Is there any server-side daemon > that acts as a client that could use utime xlator ? if not, I think we > don't need an additional option here. > No, no other server side daemon has utime xlator loaded. [1] https://review.gluster.org/#/c/glusterfs/+/22858/ > >>> Xavi >>> >>> >>>> >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1593542 >>>> [2] https://github.com/gluster/glusterfs/issues/473 >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R
_______________________________________________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
