Hi Raj, Thanks for the explanation. We will have to rethink our upgrade process.
Thanks again. Raj On Thu, Feb 21, 2019, 10:23 PM Raj <rajgau...@gmail.com> wrote: > Hello Raj, > It’s best and safe to unmount from all the clients and then do the > upgrade. Your FS is getting more OSTs and changing conf in the existing > ones, your client needs to get the new layout by remounting it. > Also you mentioned about client eviction, during eviction the client has > to drop it’s dirty pages and all the open file descriptors in the FS will > be gone. > > On Thu, Feb 21, 2019 at 12:25 PM Raj Ayyampalayam <ans...@gmail.com> > wrote: > >> What can I expect to happen to the jobs that are suspended during the >> file system restart? >> Will the processes holding an open file handle die when I unsuspend them >> after the filesystem restart? >> >> Thanks! >> -Raj >> >> >> On Thu, Feb 21, 2019 at 12:52 PM Colin Faber <cfa...@gmail.com> wrote: >> >>> Ah yes, >>> >>> If you're adding to an existing OSS, then you will need to reconfigure >>> the file system which requires writeconf event. >>> >> >>> On Thu, Feb 21, 2019 at 10:00 AM Raj Ayyampalayam <ans...@gmail.com> >>> wrote: >>> >>>> The new OST's will be added to the existing file system (the OSS nodes >>>> are already part of the filesystem), I will have to re-configure the >>>> current HA resource configuration to tell it about the 4 new OST's. >>>> Our exascaler's HA monitors the individual OST and I need to >>>> re-configure the HA on the existing filesystem. >>>> >>>> Our vendor support has confirmed that we would have to restart the >>>> filesystem if we want to regenerate the HA configs to include the new >>>> OST's. >>>> >>>> Thanks, >>>> -Raj >>>> >>>> >>>> On Thu, Feb 21, 2019 at 11:23 AM Colin Faber <cfa...@gmail.com> wrote: >>>> >>>>> It seems to me that steps may still be missing? >>>>> >>>>> You're going to rack/stack and provision the OSS nodes with new OSTs'. >>>>> >>>>> Then you're going to introduce failover options somewhere? new osts? >>>>> existing system? etc? >>>>> >>>>> If you're introducing failover with the new OST's and leaving the >>>>> existing system in place, you should be able to accomplish this without >>>>> bringing the system offline. >>>>> >>>>> If you're going to be introducing failover to your existing system >>>>> then you will need to reconfigure the file system to accommodate the new >>>>> failover settings (failover nides, etc.) >>>>> >>>>> -cf >>>>> >>>>> >>>>> On Thu, Feb 21, 2019 at 9:13 AM Raj Ayyampalayam <ans...@gmail.com> >>>>> wrote: >>>>> >>>>>> Our upgrade strategy is as follows: >>>>>> >>>>>> 1) Load all disks into the storage array. >>>>>> 2) Create RAID pools and virtual disks. >>>>>> 3) Create lustre file system using mkfs.lustre command. (I still have >>>>>> to figure out all the parameters used on the existing OSTs). >>>>>> 4) Create mount points on all OSSs. >>>>>> 5) Mount the lustre OSTs. >>>>>> 6) Maybe rebalance the filesystem. >>>>>> My understanding is that the above can be done without bringing the >>>>>> filesystem down. I want to create the HA configuration (corosync and >>>>>> pacemaker) for the new OSTs. This step requires the filesystem to be >>>>>> down. >>>>>> I want to know what would happen to the suspended processes across the >>>>>> cluster when I bring the filesystem down to re-generate the HA configs. >>>>>> >>>>>> Thanks, >>>>>> -Raj >>>>>> >>>>>> On Thu, Feb 21, 2019 at 12:59 AM Colin Faber <cfa...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Can you provide more details on your upgrade strategy? In some cases >>>>>>> expanding your storage shouldn't impact client / job activity at all. >>>>>>> >>>>>>> On Wed, Feb 20, 2019, 11:09 AM Raj Ayyampalayam <ans...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> We are planning on expanding our storage by adding more OSTs to our >>>>>>>> lustre file system. It looks like it would be easier to expand if we >>>>>>>> bring >>>>>>>> the filesystem down and perform the necessary operations. We are >>>>>>>> planning >>>>>>>> to suspend all the jobs running on the cluster. We originally planned >>>>>>>> to >>>>>>>> add new OSTs to the live filesystem. >>>>>>>> >>>>>>>> We are trying to determine the potential impact to the suspended >>>>>>>> jobs if we bring down the filesystem for the upgrade. >>>>>>>> One of the questions we have is what would happen to the suspended >>>>>>>> processes that hold an open file handle in the lustre file system when >>>>>>>> the >>>>>>>> filesystem is brought down for the upgrade? >>>>>>>> Will they recover from the client eviction? >>>>>>>> >>>>>>>> We do have vendor support and have engaged them. I wanted to ask >>>>>>>> the community and get some feedback. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Raj >>>>>>>> >>>>>>> _______________________________________________ >>>>>>>> lustre-discuss mailing list >>>>>>>> lustre-discuss@lists.lustre.org >>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>>>>>> >>>>>>> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org