On 11/09/2020 15:25, Stephen Ulmer wrote:
On Sep 9, 2020, at 10:04 AM, Skylar Thompson <[email protected]
<mailto:[email protected]>> wrote:
On Wed, Sep 09, 2020 at 12:02:53PM +0100, Jonathan Buzzard wrote:
On 08/09/2020 18:37, IBM Spectrum Scale wrote:
I think it is incorrect to assume that a command that continues
after detecting the working directory has been removed is going to
cause damage to the file system.
No I am not assuming it will cause damage. I am making the fairly
reasonable
assumption that any command which fails has an increased probability of
causing damage to the file system over one that completes successfully.
I think there is another angle here, which is that this command's output
has the possibility of triggering an "oh ----" (fill in your preferred
colorful metaphor here) moment, followed up by a panicked Ctrl-C. That
reaction has the possibility of causing its own problems (i.e. not sure if
mmafmctl touches CCR, but aborting it midway could leave CCR
inconsistent).
I'm with Jonathan here: the command should fail with an informative
message, and the admin can correct the problem (just cd somewhere else).
I’m now (genuinely) curious as to what Spectrum Scale commands
*actually* depend on the working directory existing and why. They
shouldn’t depend on anything but existing well-known directories (logs,
SDR, /tmp, et cetera) and any file or directories passed as arguments to
the command. This is the Unix way.
It seems like the *right* solution is to armor commands against doing
something “bad” if they lose a resource required to complete their task.
If $PWD goes away because an admin’s home goes away in the middle of a
long restripe, it’s better to complete the work and let them look in the
logs. It's not Scale’s problem if something not affecting its work happens.
>
> Maybe I’ve got a blind spot here...
>
This jogged my memory that best practice would be to have a call to
chdir to set the working directory to "/" very early on. Before anything
critical is started.
I am 99.999% sure that its covered in Steven's (can't check as I am away
for the weekend) so really there is no excuse. If / goes away then
really really bad things have happened and it all sort of becomes moot
anyway.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss