On Sun, May 31, 2020 at 3:36 PM Zac Medico <zmed...@gentoo.org> wrote: > > On 5/31/20 12:21 PM, Mike Gilbert wrote: > > On Sun, May 31, 2020 at 2:01 PM Zac Medico <zmed...@gentoo.org> wrote: > >> > >> On 5/31/20 6:17 AM, Mike Gilbert wrote: > >>> Unquote the URL basename when fetching from upstream. > >>> Quote the filename when fetching from mirrors. > >>> > >>> Bug: https://bugs.gentoo.org/719810 > >>> Signed-off-by: Mike Gilbert <flop...@gentoo.org> > >>> --- > >>> lib/portage/dbapi/porttree.py | 7 ++++++- > >>> lib/portage/package/ebuild/fetch.py | 9 +++++++-- > >>> 2 files changed, 13 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/lib/portage/dbapi/porttree.py b/lib/portage/dbapi/porttree.py > >>> index 08af17bcd..984263039 100644 > >>> --- a/lib/portage/dbapi/porttree.py > >>> +++ b/lib/portage/dbapi/porttree.py > >>> @@ -55,6 +55,11 @@ try: > >>> except ImportError: > >>> from urlparse import urlparse > >>> > >>> +try: > >>> + from urllib.parse import unquote as urlunquote > >>> +except ImportError: > >>> + from urllib import unquote as urlunquote > >>> + > >>> if sys.hexversion >= 0x3000000: > >>> # pylint: disable=W0622 > >>> basestring = str > >>> @@ -1547,7 +1552,7 @@ def _parse_uri_map(cpv, metadata, use=None): > >>> myuris.pop() > >>> distfile = myuris.pop() > >>> else: > >>> - distfile = os.path.basename(uri) > >>> + distfile = urlunquote(os.path.basename(uri)) > >>> if not distfile: > >>> raise portage.exception.InvalidDependString( > >>> ("getFetchMap(): '%s' SRC_URI has > >>> no file " + \ > >>> diff --git a/lib/portage/package/ebuild/fetch.py > >>> b/lib/portage/package/ebuild/fetch.py > >>> index 28e7caf53..47c3ad28f 100644 > >>> --- a/lib/portage/package/ebuild/fetch.py > >>> +++ b/lib/portage/package/ebuild/fetch.py > >>> @@ -26,6 +26,11 @@ try: > >>> except ImportError: > >>> from urlparse import urlparse > >>> > >>> +try: > >>> + from urllib.parse import quote as urlquote > >>> +except ImportError: > >>> + from urllib import quote as urlquote > >>> + > >>> import portage > >>> portage.proxy.lazyimport.lazyimport(globals(), > >>> 'portage.package.ebuild.config:check_config_instance,config', > >>> @@ -351,7 +356,7 @@ _size_suffix_map = { > >>> > >>> class FlatLayout(object): > >>> def get_path(self, filename): > >>> - return filename > >>> + return urlquote(filename) > >>> > >>> def get_filenames(self, distdir): > >>> for dirpath, dirnames, filenames in os.walk(distdir, > >>> @@ -382,7 +387,7 @@ class FilenameHashLayout(object): > >>> c = c // 4 > >>> ret += fnhash[:c] + '/' > >>> fnhash = fnhash[c:] > >>> - return ret + filename > >>> + return ret + urlquote(filename) > >>> > >>> def get_filenames(self, distdir): > >>> pattern = '' > >>> > >> > >> We've also got these other basename calls in fetch.py: > >> > >>> diff --git a/lib/portage/package/ebuild/fetch.py > >>> b/lib/portage/package/ebuild/fetch.py > >>> index 28e7caf53..56b375d58 100644 > >>> --- a/lib/portage/package/ebuild/fetch.py > >>> +++ b/lib/portage/package/ebuild/fetch.py > >>> @@ -730,9 +730,9 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0, > >>> else: > >>> for myuri in myuris: > >>> if urlparse(myuri).scheme: > >>> - > >>> file_uri_tuples.append((os.path.basename(myuri), myuri)) > >>> + > >>> file_uri_tuples.append((urlunquote(os.path.basename(myuri)), myuri)) > >>> else: > >>> - > >>> file_uri_tuples.append((os.path.basename(myuri), None)) > >>> + > >>> file_uri_tuples.append((urlunquote(os.path.basename(myuri)), None)) > >> > >> The case with no scheme is not a URI, so we need to decide whether > >> or not to unquote, and we should make _parse_uri_map behavior > >> consistent for this case. > > > > Backing up, I think unquoting the basename is really a separate issue > > from quoting the filename in Layout.get_path(). The latter fixes a > > known bug when fetching from mirrors, whereas the former seems like a > > cosmetic issue. I don't think we should mix these two up into the same > > commit. > > > > Could I get your approval to push my original patch (Escape > > percent-signs in filename when fetching from mirrors)? > > Separate patches are fine, but both patches really should be merged at > the same time since the quoting and unquoting are inherently coupled.
I think that they are not actually coupled at all. It's two sets of code used in different contexts. My original patch could and should be pushed independently of any subsequent changes. The unquoting code in this subsequent discussion only affects how a URL is translated to a filename on disk when fetching from upstream and generating the Manifest. Ebuild developers can always override this translation with an arrow operator in SRC_URI, as happens in go-module.eclass. The quoting code deals with escaping whatever on-disk filename portage happens to be using (based on the URL basename or the right-hand side of the SRC_URI arrow) so that it can be fetched from a mirror. It doesn't matter if that disk filename has some ugly percent signs or not, so long as we can send an appropriate request to the mirror web server. If you have a counter-example where things will fail with my original patch, please elaborate.