[ceph-users] Permission problem upgrading Raspi-cluster from 16.2.7 to 17.2.0

Ulrich Klein Wed, 27 Apr 2022 02:21:46 -0700

Hi,

Yesterday I upgraded my smallest test system, 4 Raspberries 4B, from Pacific 
16.2.7 (cephadm/containerized) to 17.2.0 using
ceph orch upgrade start --ceph-version 17.2.0


It mostly worked ok, but wouldn't have finished without manual intervention.
Apparently each time a mgr is upgraded the process creates new 
/etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring files on all nodes. 
To do that it looks like it first copies the files to /tmp/etc/ceph/ceph.conf 
on the node, then changes owwner and permission and then tries to move the file 
into place. Unfortunately it changes owner/permission in way so that it doesn't 
have permission to write to and move the file resulting in somethig like this 
in an infinite (?) loop:

2022-04-27T09:03:45.032808+0000 mgr.ceph00.lpaijp (mgr.2314108) 605 : cephadm 
[ERR] executing refresh((['ceph00', 'ceph01', 'ceph02', 'ceph03'],)) failed.
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
    await asyncssh.scp(f.name, (conn, tmp_path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
    await source.run(srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
    await self._send_files(path, b'')
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files
    self.handle_error(exc)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error
    raise exc from None
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files
    await self._send_file(srcpath, dstpath, attrs)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file
    await self._make_cd_request(b'C', attrs, size, srcpath)
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in 
_make_cd_request
    self._fs.basename(path))
  File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request
    raise exc
asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work
    return f(*arg)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh
    self._write_client_files(client_files, host)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in _write_client_files
    self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid)
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in write_remote_file
    host, path, content, mode, uid, gid, addr))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async
    return self.event_loop.get_result(coro)
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result
    return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
  File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in _write_remote_file
    raise OrchestratorError(msg)
orchestrator._interface.OrchestratorError: Unable to write 
ceph02:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new: Permission denied


On each node I had to do
cd /usr/bin
mv chmod chmod_real ; ln -s true chmod
mv chown chown_real ; ln -s true chown

And then whenever the file(s) appeared:
chmod_real 666 /tmp/etc/ceph/ceph.conf.new

to make it get over that hurdle. And once finished restore the chown/chmod 
binaries and permissions.
I wonder if anyone else has seen that on Intel/AMD machines? Looks like a 
pretty obvious problem with the process shooting itself in the permission foot, 
and on bigger clusters that process would be a time consuming pain.

Ciao, Uli

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Permission problem upgrading Raspi-cluster from 16.2.7 to 17.2.0

Reply via email to