Granted that I'm not an LNet expert, but "errno: -1 descr: cannot parse net
'<255:65535>' " doesn't immediately lead me to the same conclusion as if
"unknown internface 'ib0' " were printed for the error message. Also "errno:
-1" is "-EPERM = Operation not permitted", and doesn't give the same
information as "-ENXIO = No such device or address" or even "-EINVAL = Invalid
argument" would.
That said, I can't even offer a patch for this myself, since that exact error
message is used in a few different places, though I suspect it is coming from
lustre_lnet_config_ni().
Looking further into this, now that I've found where (I think) the error
message is generated, it seems that "errno: -1" is not "-EPERM" but rather
"LUSTRE_CFG_RC_BAD_PARAM", which is IMHO a travesty to use different error
numbers (and then print them after "errno:") instead of existing POSIX error
codes that could fill the same role (with some creative mapping):
#define LUSTRE_CFG_RC_NO_ERR 0 => fine
#define LUSTRE_CFG_RC_BAD_PARAM -1 => -EINVAL
#define LUSTRE_CFG_RC_MISSING_PARAM -2 => -EFAULT
#define LUSTRE_CFG_RC_OUT_OF_RANGE_PARAM -3 => -ERANGE
#define LUSTRE_CFG_RC_OUT_OF_MEM -4 => -ENOMEM
#define LUSTRE_CFG_RC_GENERIC_ERR -5 => -ENODATA
#define LUSTRE_CFG_RC_NO_MATCH -6 => -ENOMSG
#define LUSTRE_CFG_RC_MATCH -7 => -EXFULL
#define LUSTRE_CFG_RC_SKIP -8 => -EBADSLT
#define LUSTRE_CFG_RC_LAST_ELEM -9 => -ECHRNG
#define LUSTRE_CFG_RC_MARSHAL_FAIL -10 => -ENOSTR
I don't think "overloading" the POSIX error codes to mean something similar is
worse than using random numbers to report errors. Also, in some cases (even in
lustre_lnet_config_ni()) it is using "rc = -errno" so the LUSTRE_CFG_RC_*
errors are *already* conflicting with POSIX error numbers, and it impossible to
distinguish between them...
The main question is whether changing these numbers will break a user->kernel
interface, or if these definitions are only in userspace? It looks like
lnetctl.c is only ever checking "!= LUSTRE_CFG_RC_NO_ERR", so maybe it is fine?
None of the values currently overlap, so it would be possible to start
accepting either of the values for the return in the user tools, and then at
some point in the future start actually returning them... Something for the
LNet folks to figure out.
Cheers, Andreas
On Jan 10, 2024, at 13:29, Jeff Johnson
<[email protected]<mailto:[email protected]>> wrote:
A LU ticket and patch for lnetctl or for me being an under-caffeinated
idiot? ;-)
On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger
<[email protected]<mailto:[email protected]>> wrote:
It would seem that the error message could be improved in this case? Could you
file an LU ticket for that with the reproducer below, and ideally along with a
patch?
Cheers, Andreas
_______________________________________________
lustre-discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org