Hi, Den fre 8 dec. 2023 kl 05:40 skrev Yasuhito FUTATSUKI < futat...@yf.bsdclub.org>:
> > > On 2023/12/07 19:33, Daniel Sahlberg wrote: > > Den tors 7 dec. 2023 kl 09:51 skrev Ruediger Pluem <rpl...@apache.org>: > > > >> I stumbled accross a Python 3 compatibility issue in > >> tools/hook-scripts/mailer/mailer.py. > >> > >> The call of self.cfg.which_groups in line 565 passes an empty string as > >> first parameter. > >> In which_groups this empty string is passed to to_str in line 1489. > >> In line 88 to_str does x.decode('utf-8') > >> In Python 2 str objects have a decode method at least in later Python 3 > >> versions they have not. Hence I propose the following patch which fixes > the > >> issue for me: > <snip> > > > There was some work done on mailer.py by Greg Stein, started here: > > https://lists.apache.org/thread/v5bh1j8qz8kk0jv1tlctxqt1k454tz0h > > However I don't think that touched these parts of the code. > > Yes, I also think the issue has existed before it. > > > I think there is more to this than that simple fix, but I'm no expert in > > Python or in the Python bindings. There are two more calls to > > which_groups(): > > > > Line 491, called with the return from > > svn.repos.ChangeCollector(...).get_changes().items() (first argument in a > > tuple). I don't know which encoding this uses. > > We decided that for all C APIs which returns char * values, their Python 3 > wrapper functions return them as bytes object, and also we don't convert > those values into str within helper functions in svn modules. Thus their > path elements should be bytes object, and as they are internal paths, > their encoding is UTF-8, regardless of locale. > > > Line 663, called with the return from sys.stdin.buffer.readlines(), > already > > processed by to_str() once. > > > > In particular the call on 663 should be double decoded unless I'm missing > > something. > > Yes, there is inconsistency here. > > > So depending on what get_changes() return, maybe we should remove the > > to_str() from which_groups() instead? > > As Config.which_group() call from Commit.__init__() has worked correctly, > the pathes as matching pattern in which_group() are expected as str. > So if remove to_str() from which_groups(), path argment for > Config.which_group() should be a str object. > > > Note that we are still somewhat supporting Python 2 so we need to make > sure > > any changes does work on Python 2 as well. > > Then, how about this patch? It at least mailer-t1.sh passed both > on python=python2.7 and on python=python3.9. > > [[[ > Fix inconsistency in path argment on Config.which_group() > > Previously, we call Config.which_group() with path as bytes in > Commit.__init__() and with path as str in Lock.__init__() and > in PropChange.__init__(), but Config.which_group handled path > argment as bytes. To fix it we only use str as path argment on > Config.wich_group(). > > * tools/hook-scripts/mailer/mailer.py > (Config.which_groups): Treat path as str. > (Commit.__init__): convert bytes path into str before calling above. > > Found by: Ruediger Pluem (rpluem {_AT_} apache.org) > > Index: tools/hook-scripts/mailer/mailer.py > =================================================================== > --- tools/hook-scripts/mailer/mailer.py (revision 1913728) > +++ tools/hook-scripts/mailer/mailer.py (working copy) > @@ -488,7 +488,7 @@ > # collect the set of groups and the unique sets of params for the > options > self.groups = { } > for path, change in self.changelist: > - for (group, params) in self.cfg.which_groups(path, log): > + for (group, params) in self.cfg.which_groups(to_str(path), log): > # turn the params into a hashable object and stash it away > param_list = sorted(params.items()) > # collect the set of paths belonging to this group > @@ -1486,9 +1486,9 @@ > "Return the path's associated groups." > groups = [] > for group, pattern, exclude_pattern, repos_params, search_logmsg_re > in self._group_re: > - match = pattern.match(to_str(path)) > + match = pattern.match(path) > if match: > - if exclude_pattern and exclude_pattern.match(to_str(path)): > + if exclude_pattern and exclude_pattern.match(path): > continue > params = repos_params.copy() > params.update(match.groupdict()) > ]]] > > Cheers, > -- > Yasuhito FUTATSUKI <futat...@yf.bsdclub.org> > This looks good to me! Thanks for the detailed explaination! Kind regards, Daniel Sahlberg