On 5/16/23 14:17, Richard W.M. Jones wrote: > On Mon, May 15, 2023 at 07:49:24PM +0200, Laszlo Ersek wrote: >> Introduce unescape_device_mapper_lvm() for turning >> >> /dev/mapper/VG-LV >> >> key IDs into >> >> /dev/VG/LV >> >> ones, unescaping doubled hyphens to single hyphens in both VG and LV in >> the process. Libguestfs uses the /dev/VG/LV format internally, for >> identifying devices, but the user might want to specify the >> /dev/mapper/VG-LV ID format with the "--key ID:..." options. >> >> Call unescape_device_mapper_lvm() from key_store_import_key(). That is, >> translate the ID as soon as the "--key" option is processed -- let the >> keystore only know about the usual /dev/VG/LV format, for later matching. >> >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168506 >> Signed-off-by: Laszlo Ersek <ler...@redhat.com> >> --- >> >> Notes: >> regcomp() must definitely allocate memory dynamically, and we >> (intentionally) never free that -- we never call regfree(). I assume >> valgrind will catch this as a leak, so we might have to extend >> "valgrind-suppressions" in each dependent superproject. However, I'm >> unable to run "make check-valgrind" successfully in e.g. virt-v2v even >> before these patches; see also >> <https://listman.redhat.com/archives/libguestfs/2023-May/031496.html>. >> >> options/keys.c | 86 ++++++++++++++++++++ >> 1 file changed, 86 insertions(+) >> >> diff --git a/options/keys.c b/options/keys.c >> index bc7459749276..f0164a6ed987 100644 >> --- a/options/keys.c >> +++ b/options/keys.c >> @@ -27,6 +27,7 @@ >> #include <errno.h> >> #include <error.h> >> #include <assert.h> >> +#include <regex.h> >> >> #include "guestfs.h" >> >> @@ -260,6 +261,90 @@ key_store_add_from_selector (struct key_store *ks, >> const char *selector) >> return key_store_import_key (ks, &key); >> } >> >> +/* A /dev/mapper/ escaped character is: >> + * - either any character except a slash or a hyphen, >> + * - or a double hyphen. >> + * >> + * Note that parens are deliberately not included here -- they will be >> included >> + * in the full pattern, for making them more easily countable. >> + * >> + * Also note that the bracket expression below does not contain a range >> + * expression, therefore it should not be locale-sensitive. >> + */ >> +#define ESC_CH "[^-/]|--" >> + >> +/* Turn /dev/mapper/VG-LV into /dev/VG/LV. */ >> +static void >> +unescape_device_mapper_lvm (char *id) >> +{ >> + static bool compiled; >> + int rc; >> + static regex_t regex; >> + regmatch_t match[5]; >> + char *output; >> + const char *vg_start, *vg_end, *lv_start, *lv_end; >> + const char *input; >> + >> + if (!compiled) { >> + /* Recognize /dev/mapper/VG-LV pathnames, where VG and LV don't contain >> + * slashes, plus any original hyphens in them are doubled for escaping. >> + */ >> + static const char pattern[] = "^/dev/(" >> + "mapper/" >> + "((" ESC_CH ")+)" >> + "-" >> + "((" ESC_CH ")+)" >> + ")$"; >> + >> + rc = regcomp (®ex, pattern, REG_EXTENDED); >> + if (rc != 0) { >> + char errbuf[256]; >> + >> + /* When regcomp() fails, the contents of "regex" are undefined, so we >> + * cannot pass "regex" to regerror(). >> + */ >> + (void)regerror (rc, NULL, errbuf, sizeof errbuf); >> + error (EXIT_FAILURE, 0, >> + _("%s: Failed to compile pattern (%s). Please report a bug >> for " >> + "libguestfs -- refer to guestfs(3) section BUGS."), >> + __func__, errbuf); >> + } >> + compiled = true; >> + } > > Any reason not to use PCRE and the COMPILE_REGEXP macro?
I didn't even think of PCRE -- regexes are standard (i.e., in POSIX). This is a relatively simple regex at that. > We also have macros like CLEANUP_PCRE2_MATCH_DATA_FREE to free match > data automaticaly. That's different, I think. If I understand "THE MATCH DATA BLOCK" section <https://www.pcre.org/current/doc/html/pcre2api.html#SEC26> correctly, it implies that dynamic memory needs to be allocated for each separate matching attempt, not just for compiling the pattern. Here we only allocate memory when compiling the pattern; regexec() needs no dynamic memory. (POSIX says that if it fails, it can only fail with REG_NOMATCH, and not (for example) with REG_ESPACE.) Furthermore, the offsets of the subexpressions are placed in the local variable "match", which doesn't even provide enough slots for all the subexpressions -- we don't care for the last subexpression. > In general PCRE regexps are a lot more usable than the POSIX stuff. I think PCRE is much more capable than POSIX regexes, regarding parsing power / expressivity. On the other hand, I find the PCRE API way more complex than the POSIX one -- I have never *not* struggled with the PCRE documentation. POSIX is just one specification page: https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html and for this RE, it is sufficient. If we're supposed to use pcre2 in common/ for consistency with libguestfs, that sounds like a valid argument (consistency is important). However, in that case, I'm tempted to rewrite this without regular expressions in the first place. That's possible; I just found regexes more comfortable, specifically due to having been used to the POSIX RE API. The argument went like, "POSIX regex is easy, and then the loops become even easier, relying on the regex-pre-check". However, the PCRE2 API is something I have no experience with, and I find the documentation baroque... So that kind of defeats the entire original argument. After some digging in libguestfs... my point is illustrated by the fact that we have the macro COMPILE_REGEXP(), and functions like guestfs_int_match2() :( Let me take a stab at a rewrite without a regular expression. I wonder if I'll dislike that enough to sweeten pcre2 to me. Thanks Laszlo > > Rich. > >> + rc = regexec (®ex, id, 5, match, 0); >> + >> + /* If there's no match, just leave "id" alone. */ >> + if (rc != 0) >> + return; >> + >> + /* Start outputting after "/dev/". */ >> + output = id + match[1].rm_so; >> + vg_start = id + match[2].rm_so; >> + vg_end = id + match[2].rm_eo; >> + lv_start = id + match[4].rm_so; >> + lv_end = id + match[4].rm_eo; >> + >> + /* Each of the following two loops produces at most as many bytes as it >> + * consumes, so we unescape "id" in-place. >> + */ >> + input = vg_start; >> + while (input < vg_end) >> + if ((*output++ = *input++) == '-') >> + input++; >> + >> + *output++ = '/'; >> + >> + input = lv_start; >> + while (input < lv_end) >> + if ((*output++ = *input++) == '-') >> + input++; >> + >> + *output++ = '\0'; >> +} >> + >> +#undef ESC_CH >> + >> struct key_store * >> key_store_import_key (struct key_store *ks, struct key_store_key *key) >> { >> @@ -278,6 +363,7 @@ key_store_import_key (struct key_store *ks, struct >> key_store_key *key) >> error (EXIT_FAILURE, errno, "realloc"); >> >> ks->keys = new_keys; >> + unescape_device_mapper_lvm (key->id); >> ks->keys[ks->nr_keys] = *key; >> ++ks->nr_keys; >> >> _______________________________________________ >> Libguestfs mailing list >> Libguestfs@redhat.com >> https://listman.redhat.com/mailman/listinfo/libguestfs > _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs