Hi there.

In the spirit of a conversation a friend showed me a couple of weeks ago from 
Radhika Parameswaran and Luke Raimbach, we’re doing something similar to Luke 
(kind of), or at least attempting it, in regards to cache chaining.

We’ve got a large research storage platform in Brisbane, Queensland, Australia 
and we’re trying to leverage a few different modes of operation.

Currently:

Cache A (IW) connects to what would be a Home (B) which then is effectively an 
NFS mount to (C) a DMF based NFS export. To a point, this works. It kind of 
allows us to use “home” as the ultimate sink, and data migration in and out of 
DMF seems to be working nicely when GPFS pulls things from (B) which don’t 
appear to currently be in (A) due to policy, or a HWM was hit (thus emptying 
cache). We’ve tested it as far out as the data ONLY being offline in tape media 
inside (C) and it still works, cleanly coming back to (A) within a very 
reasonable time-frame.


·         We hit “problem 1” which is in and around NFS v4 ACL’s which aren’t 
surfacing or mapping correctly (as we’d expect). I guess this might be the 
caveat of trying to backend the cache to a home and have it sitting inside DMF 
(over an NFS Export) for surfacing of the data for clients.

Where we’d like to head:

We haven’t seen it yet, but as Luke and Radhika were discussing last month, we 
really liked the idea of an IW Cache (A, where instruments dump huge data) 
which then via AFM ends up at (B) (might also be technically “home” but IW) 
which is then also a function of (C) which might also be another cache that 
sits next to a HPC platform for reading and writing data into quickly and out 
of in parallel.

We like the idea of chained caches because it gives us extremely flexibility in 
the premise of our “Data anywhere” fabric. We appreciate that this has some 
challenges, in that we know if you’ve got multiple IW scenarios the last write 
will always win – this we can control with workload guidelines. But we’d like 
to add our voices to this idea of having caches chained all the way back to 
some point such that data is being pulled all the way from C --> B --> A and 
along the way, inflection points of IO might be written and read at point C and 
point B AND point A such that everyone would see the distribution and 
consistent data in the end.

We’re also working on surfacing data via object and file simultaneously for 
different needs. This is coming along relatively well, but we’re still learning 
about where and where this does not make sense so far. A moving target, from 
how it all appears on the surface.

Some might say that is effectively asking for a globally eventually (always) 
consistent filesystem within Scale’.

Anyway – just some thoughts.

Regards,

-jc





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to