commit python-tldextract for openSUSE:Factory

Source-Sync Thu, 10 Nov 2022 05:24:43 -0800

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-tldextract for 
openSUSE:Factory checked in at 2022-11-10 14:23:22
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-tldextract (Old)
 and      /work/SRC/openSUSE:Factory/.python-tldextract.new.1597 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-tldextract"

Thu Nov 10 14:23:22 2022 rev:16 rq:1035015 version:3.4.0

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-tldextract/python-tldextract.changes      
2022-07-26 19:44:36.114159443 +0200
+++ 
/work/SRC/openSUSE:Factory/.python-tldextract.new.1597/python-tldextract.changes
    2022-11-10 14:24:27.091085600 +0100
@@ -1,0 +2,12 @@
+Thu Nov 10 09:04:18 UTC 2022 - Mia Herkt <[email protected]>
+
+- Update to 3.4.0
+Features
+  * Add method extract_urllib to extract from a
+    urllib.parse.{ParseResult,SplitResult}
+    #gh/john-kurkowski/tldextract#274
+Bugfixes
+  * Fix internal type-var error, in newer versions of mypy
+    #gh/john-kurkowski/tldextract#275
+
+-------------------------------------------------------------------

Old:
----
  tldextract-3.3.1.tar.gz

New:
----
  tldextract-3.4.0.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-tldextract.spec ++++++
--- /var/tmp/diff_new_pack.AktiKe/_old  2022-11-10 14:24:27.555088226 +0100
+++ /var/tmp/diff_new_pack.AktiKe/_new  2022-11-10 14:24:27.559088249 +0100
@@ -20,7 +20,7 @@
 %define skip_python2 1
 %define oldpython python
 Name:           python-tldextract
-Version:        3.3.1
+Version:        3.4.0
 Release:        0
 Summary:        Python module to separate the TLD of a URL
 License:        BSD-3-Clause

++++++ tldextract-3.3.1.tar.gz -> tldextract-3.4.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/CHANGELOG.md 
new/tldextract-3.4.0/CHANGELOG.md
--- old/tldextract-3.3.1/CHANGELOG.md   2022-07-08 20:56:46.000000000 +0200
+++ new/tldextract-3.4.0/CHANGELOG.md   2022-10-04 22:20:45.000000000 +0200
@@ -3,6 +3,13 @@
 After upgrading, update your cache file by deleting it or via `tldextract
 --update`.
 
+## 3.4.0 (2022-10-04)
+
+* Features
+  * Add method `extract_urllib` to extract from a 
`urllib.parse.{ParseResult,SplitResult}` 
([#274](https://github.com/john-kurkowski/tldextract/issues/274))
+* Bugfixes
+  * Fix internal type-var error, in newer versions of mypy 
([#275](https://github.com/john-kurkowski/tldextract/issues/275))
+
 ## 3.3.1 (2022-07-08)
 
 * Bugfixes
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/PKG-INFO 
new/tldextract-3.4.0/PKG-INFO
--- old/tldextract-3.3.1/PKG-INFO       2022-07-08 20:58:25.247439100 +0200
+++ new/tldextract-3.4.0/PKG-INFO       2022-10-04 22:21:16.650842200 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: tldextract
-Version: 3.3.1
+Version: 3.4.0
 Summary: Accurately separates a URL's subdomain, domain, and public suffix, 
using the Public Suffix List (PSL). By default, this includes the public ICANN 
TLDs and their exceptions. You can optionally support the Public Suffix List's 
private domains as well.
 Home-page: https://github.com/john-kurkowski/tldextract
 Author: John Kurkowski
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/README.md 
new/tldextract-3.4.0/README.md
--- old/tldextract-3.3.1/README.md      2022-06-06 01:36:58.000000000 +0200
+++ new/tldextract-3.4.0/README.md      2022-10-04 22:01:32.000000000 +0200
@@ -95,7 +95,7 @@
 pip install -e 'git://github.com/john-kurkowski/tldextract.git#egg=tldextract'
 ```
 
-Command-line usage, splits the url components by space:
+Command-line usage, splits the URL components by space:
 
 ```zsh
 tldextract http://forums.bbc.co.uk
@@ -179,9 +179,9 @@
 ```
 
 The thinking behind the default is, it's the more common case when people
-mentally parse a URL. It doesn't assume familiarity with the PSL nor that the
-PSL makes such a distinction. Note this may run counter to the default parsing
-behavior of other, PSL-based libraries.
+mentally parse a domain name. It doesn't assume familiarity with the PSL nor
+that the PSL makes a public/private distinction. Note this default may run
+counter to the default parsing behavior of other, PSL-based libraries.
 
 ### Specifying your own URL or file for Public Suffix List data
 
@@ -211,7 +211,7 @@
 Use an absolute path when specifying the `suffix_list_urls` keyword argument.
 `os.path` is your friend.
 
-The command line update command can be used with a url or local file you 
specify:
+The command line update command can be used with a URL or local file you 
specify:
 
 ```zsh
 tldextract --update --suffix_list_url "http://foo.bar.baz";
@@ -237,10 +237,21 @@
 URL validators out there, this library is very lenient with input. If valid
 URLs are important to you, validate them before calling `tldextract`.
 
-This lenient stance lowers the learning curve of using the library, at the cost
-of desensitizing users to the nuances of URLs. Who knows how much. But in the
-future, I would consider an overhaul. For example, users could opt into
-validation, either receiving exceptions or error metadata on results.
+To avoid parsing a string twice, you can pass `tldextract` the output of
+[`urllib.parse`](https://docs.python.org/3/library/urllib.parse.html) methods.
+For example:
+
+```py
+extractor = TLDExtract()
+split_url = urllib.parse.urlsplit("https://foo.bar.com:8080";)
+split_suffix = extractor.extract_urllib(split_url)
+url_to_crawl = 
f"{split_url.scheme}://{split_suffix.registered_domain}:{split_url.port}"
+```
+
+`tldextract`'s lenient string parsing stance lowers the learning curve of using
+the library, at the cost of desensitizing users to the nuances of URLs. This
+could be overhauled. For example, users could opt into validation, either
+receiving exceptions or error metadata on results.
 
 ## Contribute
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/_version.py 
new/tldextract-3.4.0/tldextract/_version.py
--- old/tldextract-3.3.1/tldextract/_version.py 2022-07-08 20:58:25.000000000 
+0200
+++ new/tldextract-3.4.0/tldextract/_version.py 2022-10-04 22:21:16.000000000 
+0200
@@ -1,5 +1,5 @@
 # coding: utf-8
 # file generated by setuptools_scm
 # don't change, don't track in version control
-__version__ = version = '3.3.1'
-__version_tuple__ = version_tuple = (3, 3, 1)
+__version__ = version = '3.4.0'
+__version_tuple__ = version_tuple = (3, 4, 0)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/cache.py 
new/tldextract-3.4.0/tldextract/cache.py
--- old/tldextract-3.3.1/tldextract/cache.py    2022-06-06 01:30:19.000000000 
+0200
+++ new/tldextract-3.4.0/tldextract/cache.py    2022-10-04 21:21:53.000000000 
+0200
@@ -7,7 +7,16 @@
 import os.path
 import sys
 from hashlib import md5
-from typing import Callable, Dict, Hashable, Iterable, Optional, TypeVar, Union
+from typing import (
+    Callable,
+    Dict,
+    Hashable,
+    Iterable,
+    Optional,
+    TypeVar,
+    Union,
+    cast,
+)
 
 from filelock import FileLock
 import requests
@@ -87,7 +96,7 @@
         # combined with a call to `.clear()` wont wipe someones hard drive
         self.file_ext = ".tldextract.json"
 
-    def get(self, namespace: str, key: Union[str, Dict[str, Hashable]]) -> T:
+    def get(self, namespace: str, key: Union[str, Dict[str, Hashable]]) -> 
object:
         """Retrieve a value from the disk cache"""
         if not self.enabled:
             raise KeyError("Cache is disabled")
@@ -121,12 +130,10 @@
             global _DID_LOG_UNABLE_TO_CACHE  # pylint: disable=global-statement
             if not _DID_LOG_UNABLE_TO_CACHE:
                 LOG.warning(
-                    (
-                        "unable to cache %s.%s in %s. This could refresh the "
-                        "Public Suffix List over HTTP every app startup. "
-                        "Construct your `TLDExtract` with a writable 
`cache_dir` or "
-                        "set `cache_dir=None` to silence this warning. %s"
-                    ),
+                    "unable to cache %s.%s in %s. This could refresh the "
+                    "Public Suffix List over HTTP every app startup. "
+                    "Construct your `TLDExtract` with a writable `cache_dir` 
or "
+                    "set `cache_dir=None` to silence this warning. %s",
                     namespace,
                     key,
                     cache_filepath,
@@ -181,12 +188,10 @@
             global _DID_LOG_UNABLE_TO_CACHE  # pylint: disable=global-statement
             if not _DID_LOG_UNABLE_TO_CACHE:
                 LOG.warning(
-                    (
-                        "unable to cache %s.%s in %s. This could refresh the "
-                        "Public Suffix List over HTTP every app startup. "
-                        "Construct your `TLDExtract` with a writable 
`cache_dir` or "
-                        "set `cache_dir=None` to silence this warning. %s"
-                    ),
+                    "unable to cache %s.%s in %s. This could refresh the "
+                    "Public Suffix List over HTTP every app startup. "
+                    "Construct your `TLDExtract` with a writable `cache_dir` 
or "
+                    "set `cache_dir=None` to silence this warning. %s",
                     namespace,
                     key_args,
                     cache_filepath,
@@ -200,7 +205,7 @@
         # pylint: disable-next=abstract-class-instantiated
         with FileLock(lock_path, timeout=self.lock_timeout):
             try:
-                result: T = self.get(namespace=namespace, key=key_args)
+                result = cast(T, self.get(namespace=namespace, key=key_args))
             except KeyError:
                 result = func(**kwargs)
                 self.set(namespace=namespace, key=key_args, value=result)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/remote.py 
new/tldextract-3.4.0/tldextract/remote.py
--- old/tldextract-3.3.1/tldextract/remote.py   2022-06-06 01:36:58.000000000 
+0200
+++ new/tldextract-3.4.0/tldextract/remote.py   2022-10-03 20:20:48.000000000 
+0200
@@ -12,6 +12,23 @@
 SCHEME_RE = re.compile(r"^([" + scheme_chars + "]+:)?//")
 
 
+def lenient_netloc(url: str) -> str:
+    """Extract the netloc of a URL-like string, similar to the netloc attribute
+    returned by urllib.parse.{urlparse,urlsplit}, but extract more leniently,
+    without raising errors."""
+
+    return (
+        SCHEME_RE.sub("", url)
+        .partition("/")[0]
+        .partition("?")[0]
+        .partition("#")[0]
+        .split("@")[-1]
+        .partition(":")[0]
+        .strip()
+        .rstrip(".")
+    )
+
+
 def looks_like_ip(maybe_ip: str) -> bool:
     """Does the given str look like an IP address?"""
     if not maybe_ip[0].isdigit():
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/tldextract.py 
new/tldextract-3.4.0/tldextract/tldextract.py
--- old/tldextract-3.3.1/tldextract/tldextract.py       2022-07-08 
20:51:36.000000000 +0200
+++ new/tldextract-3.4.0/tldextract/tldextract.py       2022-10-03 
20:30:43.000000000 +0200
@@ -53,11 +53,12 @@
 import re
 from functools import wraps
 from typing import FrozenSet, List, NamedTuple, Optional, Sequence, Union
+import urllib.parse
 
 import idna
 
 from .cache import DiskCache, get_cache_dir
-from .remote import IP_RE, SCHEME_RE, looks_like_ip
+from .remote import IP_RE, lenient_netloc, looks_like_ip
 from .suffix_list import get_suffix_lists
 
 LOG = logging.getLogger("tldextract")
@@ -208,28 +209,48 @@
     def __call__(
         self, url: str, include_psl_private_domains: Optional[bool] = None
     ) -> ExtractResult:
+        """Alias for `extract_str`."""
+        return self.extract_str(url, include_psl_private_domains)
+
+    def extract_str(
+        self, url: str, include_psl_private_domains: Optional[bool] = None
+    ) -> ExtractResult:
         """
         Takes a string URL and splits it into its subdomain, domain, and
-        suffix (effective TLD, gTLD, ccTLD, etc.) component.
+        suffix (effective TLD, gTLD, ccTLD, etc.) components.
 
-        >>> extract = TLDExtract()
-        >>> extract('http://forums.news.cnn.com/')
+        >>> extractor = TLDExtract()
+        >>> extractor.extract_str('http://forums.news.cnn.com/')
         ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
-        >>> extract('http://forums.bbc.co.uk/')
+        >>> extractor.extract_str('http://forums.bbc.co.uk/')
         ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk')
         """
+        return self._extract_netloc(lenient_netloc(url), 
include_psl_private_domains)
 
-        netloc = (
-            SCHEME_RE.sub("", url)
-            .partition("/")[0]
-            .partition("?")[0]
-            .partition("#")[0]
-            .split("@")[-1]
-            .partition(":")[0]
-            .strip()
-            .rstrip(".")
-        )
+    def extract_urllib(
+        self,
+        url: Union[urllib.parse.ParseResult, urllib.parse.SplitResult],
+        include_psl_private_domains: Optional[bool] = None,
+    ) -> ExtractResult:
+        """
+        Takes the output of urllib.parse URL parsing methods and further splits
+        the parsed URL into its subdomain, domain, and suffix (effective TLD,
+        gTLD, ccTLD, etc.) components.
 
+        This method is like `extract_str` but faster, as the string's domain
+        name has already been parsed.
+
+        >>> extractor = TLDExtract()
+        >>> 
extractor.extract_urllib(urllib.parse.urlsplit('http://forums.news.cnn.com/'))
+        ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
+        >>> 
extractor.extract_urllib(urllib.parse.urlsplit('http://forums.bbc.co.uk/'))
+        ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk')
+        """
+        return self._extract_netloc(url.netloc, include_psl_private_domains)
+
+    def _extract_netloc(
+        self, netloc: str, include_psl_private_domains: Optional[bool]
+    ) -> ExtractResult:
         labels = _UNICODE_DOTS_RE.split(netloc)
 
         translations = [_decode_punycode(label) for label in labels]
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract.egg-info/PKG-INFO 
new/tldextract-3.4.0/tldextract.egg-info/PKG-INFO
--- old/tldextract-3.3.1/tldextract.egg-info/PKG-INFO   2022-07-08 
20:58:25.000000000 +0200
+++ new/tldextract-3.4.0/tldextract.egg-info/PKG-INFO   2022-10-04 
22:21:16.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: tldextract
-Version: 3.3.1
+Version: 3.4.0
 Summary: Accurately separates a URL's subdomain, domain, and public suffix, 
using the Public Suffix List (PSL). By default, this includes the public ICANN 
TLDs and their exceptions. You can optionally support the Public Suffix List's 
private domains as well.
 Home-page: https://github.com/john-kurkowski/tldextract
 Author: John Kurkowski

commit python-tldextract for openSUSE:Factory

Reply via email to