Thanks again Jermey. This is pretty strange but here goes: SSK encryption works end to end if I ssh as root into the server and client nodes to mount. If I ssh as another user (say, centos) and `sudo` or `sudo -s` the same commands with --skpath, the client mount fails.
So it seems like there is something going on with how user and session keys are loaded into the linux keyring and later made available, but I haven't gone further in my investigation than this. I was able to get what I needed for performance numbers which was my goal in setting up ssk encryption: thanks again! Mark On Sun, Jun 24, 2018 at 6:46 PM Jeremy Filizetti <[email protected]> wrote: > I have encountered this issue before as well. Something on the system is > creating a new root user session keyring and keyctl_read fails after that > happens. For now reloading the key into the keyring is what I have done. > For the client you could mount with --skpath option so any time it's > mounted it reloads the key but there is still the issue when the session > context expires and the keys are re-established keyctl_read will fail again > if a new keyring is created. I'm not sure when I'll have time to put > together a fix for this but let me know if mounting with skpath option > works. > > Jeremy > > On Sun, Jun 24, 2018 at 4:41 PM, Mark Roper <[email protected]> wrote: > >> Hi Jeremy, >> >> Thanks for taking a look at my question. I have validated that the key on >> the server and the client match and that the client key has the prime >> generated. >> >> When I ssh to the client node and run >> sudo mount -t lustre -o skpath=/secure_directory/scratch.client.key >> 172.31.46.245@tcp:/scratch /scratch >> I get the following output in /var/log/messages with verbosity turned up >> to trace on the MDS node I see: >> >> Jun 24 20:26:41 ip-172-31-44-121 lsvcgssd[23975]: keyctl_read() failed >> for key 27091278: Permission denied >> >> Jun 24 20:26:41 ip-172-31-44-121 lsvcgssd[23975]: Failed to create sk >> credentials >> As I mentioned, If I remove the option I'm able to mount the FS. I'm >> using Lustre 2.11 server and clients. The server kernel is >> 3.10.0-693.21.1.el7_lustre.x86_64 and the client kernel is >> 3.10.0-693.21.1.el7.x86_64. >> >> I am wondering if this has something to do with linux keyring permissions >> on CentOS. When I ssh to my server and client nodes as the user `centos` >> and run `sudo lgss_sk -l /secure_directory/scratch.<server | client>.key` >> followed by `keyctl show`, the lustre user key does not appear in the list >> of keys. If I ssh to the client & server nodes as root and run the same >> two commands, the lustre key shows up on the server as: >> >> 772711346 --alswrv 0 0 keyring: _ses >> >> 1047091535 --alswrv 0 65534 \_ keyring: _uid.0 >> >> 27091278 --alswrv 0 0 \_ user: lustre:scratch:default >> >> ... and on the client as: >> >> Session Keyring >> >> 269152212 --alswrv 0 0 keyring: _ses >> >> 1059491764 --alswrv 0 65534 \_ keyring: _uid.0 >> >> 146272009 --alswrv 0 0 \_ user: lustre:scratch >> I'm going to try setting up a 2.10.3 server and client to see if this is >> some kind of regression in 2.11 and not just me fat fingering something. >> I'm also going to dive deeper into keyring permissions and see if I can >> find anything there. I'll update this thread for those interested if I >> figure it out. >> >> Any additional thoughts would be appreciated! >> >> Cheers, >> >> Mark >> >> >> On Sun, Jun 24, 2018 at 4:02 PM Jeremy Filizetti < >> [email protected]> wrote: >> >>> GSS error 0x60000 is GSS bad signature which would mean the HMAC was >>> invalid. Can you verify your key file's have the same shared key? Do you >>> have any logs for the server side as well? You can increase server >>> verbosity by adding some extra v's to LSVCGSSDARGS in >>> /etc/sysconfig/lsvcgss. >>> >>> Jeremy >>> >>> On Fri, Jun 22, 2018 at 3:41 PM, Mark Roper <[email protected]> wrote: >>> >>>> Hi Lustre Admins, >>>> >>>> I am hoping someone can help me understand what I'm doing wrong with >>>> SSK setup. I have set up a lustre 2.11 server and worked through the steps >>>> to use shared secret keys (SSKs) to encrypt data in transit between client >>>> nodes and the MDT and OSS. I followed the manual instructions here: >>>> http://doc.lustre.org/lustre_manual.xhtml#idm140687075065344 >>>> >>>> Before enabling the encryption settings on the MDT, I can mount the FS >>>> on the client node. After I turn on the encryption I get back an >>>> encryption refused error and cannot mount: >>>> >>>> mount.lustre: mount 172.31.46.245@tcp:/scratch at /scratch failed: >>>> Connection refused >>>> >>>> The keys are definitely distributed to client nodes and server nodes >>>> and the settings have all been made as instruct4red in the manual (I did >>>> this a few times from scratch to make sure). I can manually load the keys >>>> into the keyring and see them by running `keyctl show`, I can compare the >>>> key files on client and server nodes with the command `lgss_sk --read >>>> /secure_directory/scratch.client.key` and validate that they all match and >>>> that the client has a prime. >>>> >>>> The commands I'm using to enable the encryption are: >>>> >>>> mdt# sudo lctl conf_param scratch.srpc.flavor.tcp.cli2mdt=skpi >>>> mdt# sudo lctl conf_param scratch.srpc.flavor.tcp.cli2ost=skpi >>>> I tried tailing /var/log/messages and am not able to interpret the >>>> output, I'm wondering - does anyone have a hypothesis about what might be >>>> wrong or instructions to debug? >>>> >>>> Log output is below! Many thanks to anyone who can help! >>>> >>>> Mark >>>> >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22250]:TRACE:main(): >>>> start parsing parameters >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22250]:INFO:main(): key >>>> 428863463, desc 0@26, ugid 0:0, sring 46159405, coinfo >>>> 38:sk:0:0:m:p:2:0x20000ac1f2109:scratch-OST1cd0-osc-MDT0000:0x20000ac1f2ef5:1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:TRACE:parse_callout_info(): components: >>>> 38,sk,0,0,m,p,2,0x20000ac1f2109,scratch-OST1cd0-osc-MDT0000,0x20000ac1f2ef5,1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:DEBUG:parse_callout_info(): parse call out info: secid 38, mech sk, >>>> ugid 0:0, is_root 0, is_mdt 1, is_ost 0, svc type p, svc 2, nid >>>> 0x20000ac1f2109, tgt scratch-OST1cd0-osc-MDT0000, self nid 0x20000ac1f2ef5, >>>> pid 1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22250]:TRACE:main(): >>>> parsing parameters OK >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:TRACE:lgss_mech_initialize(): initialize mech sk >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:TRACE:lgss_create_cred(): create a sk cred at 0x1ecc2e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22250]:TRACE:main(): >>>> caller's namespace is the same >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:TRACE:lgss_prepare_cred(): preparing sk cred 0x1ecc2e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:INFO:sk_create_cred(): Creating credentials for target: >>>> scratch-OST1cd0-osc-MDT0000 with nodemap: (null) >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:INFO:sk_create_cred(): Searching for key with description: >>>> lustre:scratch >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22250]:TRACE:prepare_and_instantiate(): instantiated kernel key 198fefe7 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22250]:TRACE:main(): >>>> forked child 22251 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:lgssc_kr_negotiate(): child start on behalf of key 198fefe7: >>>> cred 0x1ecc2e0, uid 0, svc 2, nid 20000ac1f2109, uids: 0:0/0:0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:INFO:ipv4_nid2hostname(): SOCKLND: net 0x20000, addr 0x9211fac => >>>> ip-172-31-33-9.us-west-2.compute.internal >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:DEBUG:lgss_get_service_str(): constructed service string: >>>> [email protected] >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:lgss_using_cred(): using sk cred 0x1ecc2e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22253]:TRACE:main(): >>>> start parsing parameters >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22253]:INFO:main(): key >>>> 189483693, desc 0@25, ugid 0:0, sring 46159405, coinfo >>>> 37:sk:0:0:m:p:2:0x20000ac1f2687:scratch-OST2b9d-osc-MDT0000:0x20000ac1f2ef5:1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:TRACE:parse_callout_info(): components: >>>> 37,sk,0,0,m,p,2,0x20000ac1f2687,scratch-OST2b9d-osc-MDT0000,0x20000ac1f2ef5,1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:DEBUG:parse_callout_info(): parse call out info: secid 37, mech sk, >>>> ugid 0:0, is_root 0, is_mdt 1, is_ost 0, svc type p, svc 2, nid >>>> 0x20000ac1f2687, tgt scratch-OST2b9d-osc-MDT0000, self nid 0x20000ac1f2ef5, >>>> pid 1 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22253]:TRACE:main(): >>>> parsing parameters OK >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:TRACE:lgss_mech_initialize(): initialize mech sk >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:TRACE:lgss_create_cred(): create a sk cred at 0x21b02e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22253]:TRACE:main(): >>>> caller's namespace is the same >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:TRACE:lgss_prepare_cred(): preparing sk cred 0x21b02e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:INFO:sk_create_cred(): Creating credentials for target: >>>> scratch-OST2b9d-osc-MDT0000 with nodemap: (null) >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:INFO:sk_create_cred(): Searching for key with description: >>>> lustre:scratch >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22253]:TRACE:prepare_and_instantiate(): instantiated kernel key 0b4b4aad >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: [22253]:TRACE:main(): >>>> forked child 22254 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:lgssc_kr_negotiate(): child start on behalf of key 0b4b4aad: >>>> cred 0x21b02e0, uid 0, svc 2, nid 20000ac1f2687, uids: 0:0/0:0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:INFO:ipv4_nid2hostname(): SOCKLND: net 0x20000, addr 0x87261fac => >>>> ip-172-31-38-135.us-west-2.compute.internal >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:DEBUG:lgss_get_service_str(): constructed service string: >>>> [email protected] >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:lgss_using_cred(): using sk cred 0x21b02e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:INFO:sk_encode_netstring(): Encoded netstring of 647 bytes >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:INFO:lgss_sk_using_cred(): Created netstring of 647 bytes >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:lgssc_negotiation_manual(): starting gss negotation >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:do_nego_rpc(): start negotiation rpc >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:gss_do_ioctl(): to open >>>> /proc/fs/lustre/sptlrpc/gss/init_channel >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:gss_do_ioctl(): to down-write >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:INFO:sk_encode_netstring(): Encoded netstring of 647 bytes >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:INFO:lgss_sk_using_cred(): Created netstring of 647 bytes >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:lgssc_negotiation_manual(): starting gss negotation >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:do_nego_rpc(): start negotiation rpc >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:gss_do_ioctl(): to open >>>> /proc/fs/lustre/sptlrpc/gss/init_channel >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:gss_do_ioctl(): to down-write >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:do_nego_rpc(): do_nego_rpc: to parse reply >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:DEBUG:do_nego_rpc(): do_nego_rpc: receive handle len 0, token len >>>> 0, res 0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:ERROR:lgssc_negotiation_manual(): negotiation gss error 60000 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:ERROR:lgssc_kr_negotiate_manual(): key 198fefe7: failed to >>>> negotiate >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:error_kernel_key(): revoking kernel key 198fefe7 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:INFO:error_kernel_key(): key 198fefe7: revoked >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22251]:TRACE:lgss_release_cred(): releasing sk cred 0x1ecc2e0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:do_nego_rpc(): do_nego_rpc: to parse reply >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:DEBUG:do_nego_rpc(): do_nego_rpc: receive handle len 0, token len >>>> 0, res 0 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:ERROR:lgssc_negotiation_manual(): negotiation gss error 60000 >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:ERROR:lgssc_kr_negotiate_manual(): key 0b4b4aad: failed to >>>> negotiate >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:error_kernel_key(): revoking kernel key 0b4b4aad >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:INFO:error_kernel_key(): key 0b4b4aad: revoked >>>> Jun 22 19:22:02 ip-172-31-46-245 lgss_keyring: >>>> [22254]:TRACE:lgss_release_cred(): releasing sk cred 0x21b02e0 >>>> >>>> _______________________________________________ >>>> lustre-discuss mailing list >>>> [email protected] >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>> >>>> >>> >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
