Hi Yuri,

Thank you for your excellent reply.

The immediate issue ahead of us is getting the 4K LUNs formatted as NSDs into 
the filesystem and since we can do that life is good for a while. The next 
hurdle for me won't be for another 3 years when I'll likely need to bring into 
existing filesystems metadata NSDs that have 4k sectors. I wonder if in that 
time frame a migration tool could be developed to convert existing filesystems 
to 4K metadata?

I do recognize what you're saying, though, about return on investment of effort 
from a developing a tool to migrate the on-disk format vs migrating the entire 
filesystem. What were you thinking as far as ways to strengthen the migration 
story? In an ideal world I'd like to be able to pull the trigger on a migration 
and have it have no visible impact to end-users other than an understandable 
performance impact.

If a mechanism existed to move data to a new filesystem (a hypothetical 
mmreplacefs comes to mind) in-place, that would be quite wonderful. At a high 
level, perhaps one would create a new filesystem that wouldn't be directly 
mounted but would be an "internal mount". An admin would then issue an 
mmreplacefs $OLD_FS $NEW_FS command. All activity would be quiesced on the old 
filesystem and in the background an AFM-like process would be initiated. Files 
not on the new fs would be migrated in the background. Once the migration 
process is complete the old fs would perhaps be renamed and the new fs would 
effectively take its place. This raises more questions in my mind than it does 
provide answers, such as how quotas would be handled during the migration, how 
filesets would get migrated/created, how ILM/DMAPI would be affected during the 
migration.

I'll give it some more thought, but understanding a little more about what you 
were thinking would help me craft the RFE.

-Aaron



________________________________
From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Yuri L Volobuev 
[volob...@us.ibm.com]
Sent: Wednesday, October 12, 2016 1:44 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] 4K sector NSD support (was: Hardware refresh)


Yes, it is possible to add a 4KN dataOnly NSD to a non-4K-aligned file system, 
as you figured out. This is something we didn't plan on doing originally, but 
then had to implement based on the feedback from the field. There's clearly a 
need for this. However, this statement is exactly it -- dataOnly NSDs. The only 
way to put metadata on a 4KN disk is to use a 4K-aligned file system. There are 
several kinds of metadata present in non-4K-aligned file system that generate 
non-4K IOs (512 byte inodes being the biggest problem), and there's no way to 
work around this short of using the new format, and there's no way to perform a 
conversion to the new format in-place.

You're welcome to submit an RFE, of course, but I'd recommend being pragmatic 
about the chances of such an RFE being implemented. As you can imagine, the 
main reason why an all-encompassing file system conversion tool doesn't exist 
is not GPFS developers having no idea that such a tool is wanted. There are 
several considerations that conspire to make this an unlikely candidate to ever 
be implemented:
1) The task is hard and has no finish line. In most GPFS releases, something 
changes, necessitating an added piece of work for the hypothetical conversion 
tool, and the matrix of from-to format version combinations gets to be big 
quite quickly.
2) A file system conversion is something that is needed very infrequently, but 
when this code does run, it absolutely has to run and run perfectly, else the 
result would be a half-converted file system, i.e. a royal mess. This is a 
tester's nightmare.
3) The failure scenarios are all unpalatable. What should the conversion tool 
do if it runs out of space replacing smaller metadata structures with bigger 
ones? Undoing a partially finished conversion is even harder than doing it in 
the first place.
4) Doing an on-disk conversion on-line is simply very difficult. Consider the 
task of converting an inode file to use a different inode size. The file can be 
huge (billions of records), and it would take a fair chunk of time to rewrite 
it, but the file is changing while it's being converted (can't simply lock the 
whole thing down for so long), simultaneously on multiple nodes. Orchestrating 
the processing of updates in the presence of two inode files, with proper 
atomicity guarantees (to guard against a node failure) is a task of 
considerable complexity.

None of this means the task is impossible, of course. It is, however, a very 
big chunk of very complex work, all towards a tool that on an average cluster 
may run somewhere between zero and one times, not something that benefits 
day-to-day operations. Where the complexity of the task allows for a reasonably 
affordable implementation, e.g. conversion from an old-style EA file to the 
FASTEA format, a conversion tool has been implemented (mmmigratefs). However, 
doing this for every single changed aspect of the file system format is simply 
too expensive to justify, given other tasks in front of us.

On the other hand, a well-implemented migration mechanism solves the file 
system reformatting scenario (which covers all aspects of file system format 
changes) as well as a number of other scenarios. This is a cleaner, more 
general solution. Migration doesn't have to mean an outage. A simple 
rsync-based migration requires downtime for a cutover, while an AFM-based 
migration doesn't necessarily require one. I'm not saying that GPFS has a 
particularly strong migration story at the moment, but this is a much more 
productive direction for applying resources than a mythical all-encompassing 
conversion tool.

yuri

[Inactive hide details for Aaron Knister ---10/11/2016 05:59:25 PM---Yuri, 
(Sorry for being somewhat spammy) I now understand th]Aaron Knister 
---10/11/2016 05:59:25 PM---Yuri, (Sorry for being somewhat spammy) I now 
understand the limitation after

From: Aaron Knister <aaron.s.knis...@nasa.gov>
To: <gpfsug-discuss@spectrumscale.org>,
Date: 10/11/2016 05:59 PM
Subject: Re: [gpfsug-discuss] 4K sector NSD support (was: Hardware refresh)
Sent by: gpfsug-discuss-boun...@spectrumscale.org

________________________________



Yuri,

(Sorry for being somewhat spammy) I now understand the limitation after
some more testing (I'm a hands-on learner, can you tell?). Given the
right code/cluster/fs version levels I can add 4K dataOnly NSDv2 NSDs to
a filesystem created with NSDv1 NSDs. What I can't do is seemingly add
any metadataOnly or dataAndMetadata 4K luns to an fs that is not 4K
aligned which I assume would be any fs originally created with NSDv1
LUNs. It seems possible to move all data away from NSDv1 LUNS in a
filesystem behind-the-scenes using GPFS migration tools, and move the
data to NSDv2 LUNs. In this case I believe what's missing is a tool to
convert just the metadata structures to be 4K aligned since the data
would already on 4k-based NSDv2 LUNS, is that the case? I'm trying to
figure out what exactly I'm asking for in an RFE.

-Aaron

On 10/11/16 7:57 PM, Aaron Knister wrote:
> I think I was a little quick to the trigger. I re-read your last mail
> after doing some testing and understand it differently. I was wrong
> about my interpretation-- you can add 4K NSDv2 formatted NSDs to a
> filesystem previously created with NSDv1 NSDs assuming, as you say, the.
> minReleaseLevel and filesystem version are high enough. That negates
> about half of my last e-mail. The fs still doesn't show as 4K aligned:
>
> loressd01:~ # /usr/lpp/mmfs/bin/mmlsfs tnb4k --is4KAligned
> flag                value                    description
> ------------------- ------------------------
> -----------------------------------
>  --is4KAligned      No                       is4KAligned?
>
> but *shrug* most of the I/O to these disks should be 1MB anyway. If
> somebody is pounding the FS with smaller than 4K I/O they're gonna get a
> talkin' to.
>
> -Aaron
>
> On 10/11/16 6:41 PM, Aaron Knister wrote:
>> Thanks Yuri.
>>
>> I'm asking for my own purposes but I think it's still relevant here:
>> we're still at GPFS 3.5 and will be adding dataOnly NSDs with 4K sectors
>> in the near future. We're planning to update to 4.1 before we format
>> these NSDs, though. If I understand you correctly we can't bring these
>> 4K NSDv2 NSDs into a filesystem with 512b-based NSDv1 NSDs? That's a
>> pretty big deal :(
>>
>> Reformatting every few years with 10's of petabytes of data is not
>> realistic for us (it would take years to move the data around). It also
>> goes against my personal preachings about GPFS's storage virtualization
>> capabilities: the ability to perform upgrades/make underlying storage
>> infrastructure changes with behind-the-scenes data migration,
>> eliminating much of the manual hassle of storage administrators doing
>> rsync dances. I guess it's RFE time? It also seems as though AFM could
>> help with automating the migration, although many of our filesystems do
>> not have filesets on them so we would have to re-think how we lay out
>> our filesystems.
>>
>> This is also curious to me with IBM pitching GPFS as a filesystem for
>> cloud services (the cloud *never* goes down, right?). Granted I believe
>> this pitch started after the NSDv2 format was defined, but if somebody
>> is building a large cloud with GPFS as the underlying filesystem for an
>> object or an image store one might think the idea of having to re-format
>> the filesystem to gain access to critical new features is inconsistent
>> with this pitch. It would be hugely impactful. Just my $.02.
>>
>> As you can tell, I'm frustrated there's no online conversion tool :) Not
>> that there couldn't be... you all are brilliant developers.
>>
>> -Aaron
>>
>> On 10/11/16 1:22 PM, Yuri L Volobuev wrote:
>>> This depends on the committed cluster version level (minReleaseLevel)
>>> and file system format. Since NFSv2 is an on-disk format change, older
>>> code wouldn't be able to understand what it is, and thus if there's a
>>> possibility of a downlevel node looking at the NSD, the NFSv1 format is
>>> going to be used. The code does NSDv1<->NSDv2 conversions under the
>>> covers as needed when adding an empty NSD to a file system.
>>>
>>> I'd strongly recommend getting a fresh start by formatting a new file
>>> system. Many things have changed over the course of the last few years.
>>> In particular, having a 4K-aligned file system can be a pretty big deal,
>>> depending on what hardware one is going to deploy in the future, and
>>> this is something that can't be bolted onto an existing file system.
>>> Having 4K inodes is very handy for many reasons. New directory format
>>> and NSD format changes are attractive, too. And disks generally tend to
>>> get larger with time, and at some point you may want to add a disk to an
>>> existing storage pool that's larger than the existing allocation map
>>> format allows. Obviously, it's more hassle to migrate data to a new file
>>> system, as opposed to extending an existing one. In a perfect world,
>>> GPFS would offer a conversion tool that seamlessly and robustly converts
>>> old file systems, making them as good as new, but in the real world such
>>> a tool doesn't exist. Getting a clean slate by formatting a new file
>>> system every few years is a good long-term investment of time, although
>>> it comes front-loaded with extra work.
>>>
>>> yuri
>>>
>>> Inactive hide details for Aaron Knister ---10/10/2016 04:45:31 PM---Can
>>> one format NSDv2 NSDs and put them in a filesystem withAaron Knister
>>> ---10/10/2016 04:45:31 PM---Can one format NSDv2 NSDs and put them in a
>>> filesystem with NSDv1 NSD's? -Aaron
>>>
>>> From: Aaron Knister <aaron.s.knis...@nasa.gov>
>>> To: <gpfsug-discuss@spectrumscale.org>,
>>> Date: 10/10/2016 04:45 PM
>>> Subject: Re: [gpfsug-discuss] Hardware refresh
>>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> Can one format NSDv2 NSDs and put them in a filesystem with NSDv1 NSD's?
>>>
>>> -Aaron
>>>
>>> On 10/10/16 7:40 PM, Luis Bolinches wrote:
>>>> Hi
>>>>
>>>> Creating a new FS sounds like a best way to go. NSDv2 being a very good
>>>> reason to do so.
>>>>
>>>> AFM for migrations is quite good, latest versions allows to use NSD
>>>> protocol for mounts as well. Olaf did a great job explaining this
>>>> scenario on the redbook chapter 6
>>>>
>>>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open
>>>>
>>>> --
>>>> Cheers
>>>>
>>>> On 10 Oct 2016, at 23.05, Buterbaugh, Kevin L
>>>> <kevin.buterba...@vanderbilt.edu
>>>> <mailto:kevin.buterba...@vanderbilt.edu>> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> The last time we did something like this was 2010 (we’re doing rolling
>>>>> refreshes now), so there are probably lots of better ways to do this
>>>>> than what we did, but we:
>>>>>
>>>>> 1) set up the new hardware
>>>>> 2) created new filesystems (so that we could make adjustments we
>>>>> wanted to make that can only be made at FS creation time)
>>>>> 3) used rsync to make a 1st pass copy of everything
>>>>> 4) coordinated a time with users / groups to do a 2nd rsync when they
>>>>> weren’t active
>>>>> 5) used symbolic links during the transition (i.e. rm -rvf
>>>>> /gpfs0/home/joeuser; ln -s /gpfs2/home/joeuser /gpfs0/home/joeuser)
>>>>> 6) once everybody was migrated, updated the symlinks (i.e. /home
>>>>> became a symlink to /gpfs2/home)
>>>>>
>>>>> HTHAL…
>>>>>
>>>>> Kevin
>>>>>
>>>>>> On Oct 10, 2016, at 2:56 PM, mark.b...@siriuscom.com
>>>>>> <mailto:mark.b...@siriuscom.com> wrote:
>>>>>>
>>>>>> Have a very old cluster built on IBM X3650’s and DS3500.  Need to
>>>>>> refresh hardware.  Any lessons learned in this process?  Is it
>>>>>> easiest to just build new cluster and then use AFM?  Add to existing
>>>>>> cluster then decommission nodes?  What is the recommended process for
>>>>>> this?
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> This message (including any attachments) is intended only for the use
>>>>>> of the individual or entity to which it is addressed and may contain
>>>>>> information that is non-public, proprietary, privileged,
>>>>>> confidential, and exempt from disclosure under applicable law. If you
>>>>>> are not the intended recipient, you are hereby notified that any use,
>>>>>> dissemination, distribution, or copying of this communication is
>>>>>> strictly prohibited. This message may be viewed by parties at Sirius
>>>>>> Computer Solutions other than those named in the message header. This
>>>>>> message does not contain an official representation of Sirius
>>>>>> Computer Solutions. If you have received this communication in error,
>>>>>> notify Sirius Computer Solutions immediately and (i) destroy this
>>>>>> message if a facsimile or (ii) delete this message immediately if
>>>>>> this is an electronic communication. Thank you.
>>>>>>
>>>>>> Sirius Computer Solutions <http://www.siriuscom.com/>
>>>>>> _______________________________________________
>>>>>> gpfsug-discuss mailing list
>>>>>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org/>
>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>>> —
>>>>> Kevin Buterbaugh - Senior System Administrator
>>>>> Vanderbilt University - Advanced Computing Center for Research and
>>>>> Education
>>>>> kevin.buterba...@vanderbilt.edu
>>>>> <mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633
>>>>>
>>>>>
>>>>>
>>>>
>>>> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
>>>> Oy IBM Finland Ab
>>>> PL 265, 00101 Helsinki, Finland
>>>> Business ID, Y-tunnus: 0195876-3
>>>> Registered in Finland
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to