Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-tldextract for
openSUSE:Factory checked in at 2022-11-10 14:23:22
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-tldextract (Old)
and /work/SRC/openSUSE:Factory/.python-tldextract.new.1597 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-tldextract"
Thu Nov 10 14:23:22 2022 rev:16 rq:1035015 version:3.4.0
Changes:
--------
--- /work/SRC/openSUSE:Factory/python-tldextract/python-tldextract.changes
2022-07-26 19:44:36.114159443 +0200
+++
/work/SRC/openSUSE:Factory/.python-tldextract.new.1597/python-tldextract.changes
2022-11-10 14:24:27.091085600 +0100
@@ -1,0 +2,12 @@
+Thu Nov 10 09:04:18 UTC 2022 - Mia Herkt <[email protected]>
+
+- Update to 3.4.0
+Features
+ * Add method extract_urllib to extract from a
+ urllib.parse.{ParseResult,SplitResult}
+ #gh/john-kurkowski/tldextract#274
+Bugfixes
+ * Fix internal type-var error, in newer versions of mypy
+ #gh/john-kurkowski/tldextract#275
+
+-------------------------------------------------------------------
Old:
----
tldextract-3.3.1.tar.gz
New:
----
tldextract-3.4.0.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-tldextract.spec ++++++
--- /var/tmp/diff_new_pack.AktiKe/_old 2022-11-10 14:24:27.555088226 +0100
+++ /var/tmp/diff_new_pack.AktiKe/_new 2022-11-10 14:24:27.559088249 +0100
@@ -20,7 +20,7 @@
%define skip_python2 1
%define oldpython python
Name: python-tldextract
-Version: 3.3.1
+Version: 3.4.0
Release: 0
Summary: Python module to separate the TLD of a URL
License: BSD-3-Clause
++++++ tldextract-3.3.1.tar.gz -> tldextract-3.4.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/CHANGELOG.md
new/tldextract-3.4.0/CHANGELOG.md
--- old/tldextract-3.3.1/CHANGELOG.md 2022-07-08 20:56:46.000000000 +0200
+++ new/tldextract-3.4.0/CHANGELOG.md 2022-10-04 22:20:45.000000000 +0200
@@ -3,6 +3,13 @@
After upgrading, update your cache file by deleting it or via `tldextract
--update`.
+## 3.4.0 (2022-10-04)
+
+* Features
+ * Add method `extract_urllib` to extract from a
`urllib.parse.{ParseResult,SplitResult}`
([#274](https://github.com/john-kurkowski/tldextract/issues/274))
+* Bugfixes
+ * Fix internal type-var error, in newer versions of mypy
([#275](https://github.com/john-kurkowski/tldextract/issues/275))
+
## 3.3.1 (2022-07-08)
* Bugfixes
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/PKG-INFO
new/tldextract-3.4.0/PKG-INFO
--- old/tldextract-3.3.1/PKG-INFO 2022-07-08 20:58:25.247439100 +0200
+++ new/tldextract-3.4.0/PKG-INFO 2022-10-04 22:21:16.650842200 +0200
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: tldextract
-Version: 3.3.1
+Version: 3.4.0
Summary: Accurately separates a URL's subdomain, domain, and public suffix,
using the Public Suffix List (PSL). By default, this includes the public ICANN
TLDs and their exceptions. You can optionally support the Public Suffix List's
private domains as well.
Home-page: https://github.com/john-kurkowski/tldextract
Author: John Kurkowski
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/README.md
new/tldextract-3.4.0/README.md
--- old/tldextract-3.3.1/README.md 2022-06-06 01:36:58.000000000 +0200
+++ new/tldextract-3.4.0/README.md 2022-10-04 22:01:32.000000000 +0200
@@ -95,7 +95,7 @@
pip install -e 'git://github.com/john-kurkowski/tldextract.git#egg=tldextract'
```
-Command-line usage, splits the url components by space:
+Command-line usage, splits the URL components by space:
```zsh
tldextract http://forums.bbc.co.uk
@@ -179,9 +179,9 @@
```
The thinking behind the default is, it's the more common case when people
-mentally parse a URL. It doesn't assume familiarity with the PSL nor that the
-PSL makes such a distinction. Note this may run counter to the default parsing
-behavior of other, PSL-based libraries.
+mentally parse a domain name. It doesn't assume familiarity with the PSL nor
+that the PSL makes a public/private distinction. Note this default may run
+counter to the default parsing behavior of other, PSL-based libraries.
### Specifying your own URL or file for Public Suffix List data
@@ -211,7 +211,7 @@
Use an absolute path when specifying the `suffix_list_urls` keyword argument.
`os.path` is your friend.
-The command line update command can be used with a url or local file you
specify:
+The command line update command can be used with a URL or local file you
specify:
```zsh
tldextract --update --suffix_list_url "http://foo.bar.baz"
@@ -237,10 +237,21 @@
URL validators out there, this library is very lenient with input. If valid
URLs are important to you, validate them before calling `tldextract`.
-This lenient stance lowers the learning curve of using the library, at the cost
-of desensitizing users to the nuances of URLs. Who knows how much. But in the
-future, I would consider an overhaul. For example, users could opt into
-validation, either receiving exceptions or error metadata on results.
+To avoid parsing a string twice, you can pass `tldextract` the output of
+[`urllib.parse`](https://docs.python.org/3/library/urllib.parse.html) methods.
+For example:
+
+```py
+extractor = TLDExtract()
+split_url = urllib.parse.urlsplit("https://foo.bar.com:8080")
+split_suffix = extractor.extract_urllib(split_url)
+url_to_crawl =
f"{split_url.scheme}://{split_suffix.registered_domain}:{split_url.port}"
+```
+
+`tldextract`'s lenient string parsing stance lowers the learning curve of using
+the library, at the cost of desensitizing users to the nuances of URLs. This
+could be overhauled. For example, users could opt into validation, either
+receiving exceptions or error metadata on results.
## Contribute
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/_version.py
new/tldextract-3.4.0/tldextract/_version.py
--- old/tldextract-3.3.1/tldextract/_version.py 2022-07-08 20:58:25.000000000
+0200
+++ new/tldextract-3.4.0/tldextract/_version.py 2022-10-04 22:21:16.000000000
+0200
@@ -1,5 +1,5 @@
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
-__version__ = version = '3.3.1'
-__version_tuple__ = version_tuple = (3, 3, 1)
+__version__ = version = '3.4.0'
+__version_tuple__ = version_tuple = (3, 4, 0)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/cache.py
new/tldextract-3.4.0/tldextract/cache.py
--- old/tldextract-3.3.1/tldextract/cache.py 2022-06-06 01:30:19.000000000
+0200
+++ new/tldextract-3.4.0/tldextract/cache.py 2022-10-04 21:21:53.000000000
+0200
@@ -7,7 +7,16 @@
import os.path
import sys
from hashlib import md5
-from typing import Callable, Dict, Hashable, Iterable, Optional, TypeVar, Union
+from typing import (
+ Callable,
+ Dict,
+ Hashable,
+ Iterable,
+ Optional,
+ TypeVar,
+ Union,
+ cast,
+)
from filelock import FileLock
import requests
@@ -87,7 +96,7 @@
# combined with a call to `.clear()` wont wipe someones hard drive
self.file_ext = ".tldextract.json"
- def get(self, namespace: str, key: Union[str, Dict[str, Hashable]]) -> T:
+ def get(self, namespace: str, key: Union[str, Dict[str, Hashable]]) ->
object:
"""Retrieve a value from the disk cache"""
if not self.enabled:
raise KeyError("Cache is disabled")
@@ -121,12 +130,10 @@
global _DID_LOG_UNABLE_TO_CACHE # pylint: disable=global-statement
if not _DID_LOG_UNABLE_TO_CACHE:
LOG.warning(
- (
- "unable to cache %s.%s in %s. This could refresh the "
- "Public Suffix List over HTTP every app startup. "
- "Construct your `TLDExtract` with a writable
`cache_dir` or "
- "set `cache_dir=None` to silence this warning. %s"
- ),
+ "unable to cache %s.%s in %s. This could refresh the "
+ "Public Suffix List over HTTP every app startup. "
+ "Construct your `TLDExtract` with a writable `cache_dir`
or "
+ "set `cache_dir=None` to silence this warning. %s",
namespace,
key,
cache_filepath,
@@ -181,12 +188,10 @@
global _DID_LOG_UNABLE_TO_CACHE # pylint: disable=global-statement
if not _DID_LOG_UNABLE_TO_CACHE:
LOG.warning(
- (
- "unable to cache %s.%s in %s. This could refresh the "
- "Public Suffix List over HTTP every app startup. "
- "Construct your `TLDExtract` with a writable
`cache_dir` or "
- "set `cache_dir=None` to silence this warning. %s"
- ),
+ "unable to cache %s.%s in %s. This could refresh the "
+ "Public Suffix List over HTTP every app startup. "
+ "Construct your `TLDExtract` with a writable `cache_dir`
or "
+ "set `cache_dir=None` to silence this warning. %s",
namespace,
key_args,
cache_filepath,
@@ -200,7 +205,7 @@
# pylint: disable-next=abstract-class-instantiated
with FileLock(lock_path, timeout=self.lock_timeout):
try:
- result: T = self.get(namespace=namespace, key=key_args)
+ result = cast(T, self.get(namespace=namespace, key=key_args))
except KeyError:
result = func(**kwargs)
self.set(namespace=namespace, key=key_args, value=result)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/remote.py
new/tldextract-3.4.0/tldextract/remote.py
--- old/tldextract-3.3.1/tldextract/remote.py 2022-06-06 01:36:58.000000000
+0200
+++ new/tldextract-3.4.0/tldextract/remote.py 2022-10-03 20:20:48.000000000
+0200
@@ -12,6 +12,23 @@
SCHEME_RE = re.compile(r"^([" + scheme_chars + "]+:)?//")
+def lenient_netloc(url: str) -> str:
+ """Extract the netloc of a URL-like string, similar to the netloc attribute
+ returned by urllib.parse.{urlparse,urlsplit}, but extract more leniently,
+ without raising errors."""
+
+ return (
+ SCHEME_RE.sub("", url)
+ .partition("/")[0]
+ .partition("?")[0]
+ .partition("#")[0]
+ .split("@")[-1]
+ .partition(":")[0]
+ .strip()
+ .rstrip(".")
+ )
+
+
def looks_like_ip(maybe_ip: str) -> bool:
"""Does the given str look like an IP address?"""
if not maybe_ip[0].isdigit():
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract/tldextract.py
new/tldextract-3.4.0/tldextract/tldextract.py
--- old/tldextract-3.3.1/tldextract/tldextract.py 2022-07-08
20:51:36.000000000 +0200
+++ new/tldextract-3.4.0/tldextract/tldextract.py 2022-10-03
20:30:43.000000000 +0200
@@ -53,11 +53,12 @@
import re
from functools import wraps
from typing import FrozenSet, List, NamedTuple, Optional, Sequence, Union
+import urllib.parse
import idna
from .cache import DiskCache, get_cache_dir
-from .remote import IP_RE, SCHEME_RE, looks_like_ip
+from .remote import IP_RE, lenient_netloc, looks_like_ip
from .suffix_list import get_suffix_lists
LOG = logging.getLogger("tldextract")
@@ -208,28 +209,48 @@
def __call__(
self, url: str, include_psl_private_domains: Optional[bool] = None
) -> ExtractResult:
+ """Alias for `extract_str`."""
+ return self.extract_str(url, include_psl_private_domains)
+
+ def extract_str(
+ self, url: str, include_psl_private_domains: Optional[bool] = None
+ ) -> ExtractResult:
"""
Takes a string URL and splits it into its subdomain, domain, and
- suffix (effective TLD, gTLD, ccTLD, etc.) component.
+ suffix (effective TLD, gTLD, ccTLD, etc.) components.
- >>> extract = TLDExtract()
- >>> extract('http://forums.news.cnn.com/')
+ >>> extractor = TLDExtract()
+ >>> extractor.extract_str('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
- >>> extract('http://forums.bbc.co.uk/')
+ >>> extractor.extract_str('http://forums.bbc.co.uk/')
ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk')
"""
+ return self._extract_netloc(lenient_netloc(url),
include_psl_private_domains)
- netloc = (
- SCHEME_RE.sub("", url)
- .partition("/")[0]
- .partition("?")[0]
- .partition("#")[0]
- .split("@")[-1]
- .partition(":")[0]
- .strip()
- .rstrip(".")
- )
+ def extract_urllib(
+ self,
+ url: Union[urllib.parse.ParseResult, urllib.parse.SplitResult],
+ include_psl_private_domains: Optional[bool] = None,
+ ) -> ExtractResult:
+ """
+ Takes the output of urllib.parse URL parsing methods and further splits
+ the parsed URL into its subdomain, domain, and suffix (effective TLD,
+ gTLD, ccTLD, etc.) components.
+ This method is like `extract_str` but faster, as the string's domain
+ name has already been parsed.
+
+ >>> extractor = TLDExtract()
+ >>>
extractor.extract_urllib(urllib.parse.urlsplit('http://forums.news.cnn.com/'))
+ ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
+ >>>
extractor.extract_urllib(urllib.parse.urlsplit('http://forums.bbc.co.uk/'))
+ ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk')
+ """
+ return self._extract_netloc(url.netloc, include_psl_private_domains)
+
+ def _extract_netloc(
+ self, netloc: str, include_psl_private_domains: Optional[bool]
+ ) -> ExtractResult:
labels = _UNICODE_DOTS_RE.split(netloc)
translations = [_decode_punycode(label) for label in labels]
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/tldextract-3.3.1/tldextract.egg-info/PKG-INFO
new/tldextract-3.4.0/tldextract.egg-info/PKG-INFO
--- old/tldextract-3.3.1/tldextract.egg-info/PKG-INFO 2022-07-08
20:58:25.000000000 +0200
+++ new/tldextract-3.4.0/tldextract.egg-info/PKG-INFO 2022-10-04
22:21:16.000000000 +0200
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: tldextract
-Version: 3.3.1
+Version: 3.4.0
Summary: Accurately separates a URL's subdomain, domain, and public suffix,
using the Public Suffix List (PSL). By default, this includes the public ICANN
TLDs and their exceptions. You can optionally support the Public Suffix List's
private domains as well.
Home-page: https://github.com/john-kurkowski/tldextract
Author: John Kurkowski