Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-charset-normalizer for
openSUSE:Factory checked in at 2023-03-29 23:26:15
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-charset-normalizer (Old)
and /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.31432 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-charset-normalizer"
Wed Mar 29 23:26:15 2023 rev:18 rq:1074517 version:3.1.0
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-charset-normalizer/python-charset-normalizer.changes
2022-12-04 14:57:55.260120466 +0100
+++
/work/SRC/openSUSE:Factory/.python-charset-normalizer.new.31432/python-charset-normalizer.changes
2023-03-29 23:26:22.471223639 +0200
@@ -1,0 +2,9 @@
+Sun Mar 26 20:04:17 UTC 2023 - Dirk Müller <[email protected]>
+
+- update to 3.1.0:
+ * Argument `should_rename_legacy` for legacy function `detect`
+ and disregard any new arguments without errors (PR #262)
+ * Removed Support for Python 3.6 (PR #260)
+ * Optional speedup provided by mypy/c 1.0.1
+
+-------------------------------------------------------------------
Old:
----
charset_normalizer-3.0.1.tar.gz
New:
----
charset_normalizer-3.1.0.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-charset-normalizer.spec ++++++
--- /var/tmp/diff_new_pack.OmyiLS/_old 2023-03-29 23:26:23.671229276 +0200
+++ /var/tmp/diff_new_pack.OmyiLS/_new 2023-03-29 23:26:23.675229295 +0200
@@ -1,7 +1,7 @@
#
# spec file for package python-charset-normalizer
#
-# Copyright (c) 2022 SUSE LLC
+# Copyright (c) 2023 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -19,12 +19,13 @@
%{?!python_module:%define python_module() python3-%{**}}
%define skip_python2 1
Name: python-charset-normalizer
-Version: 3.0.1
+Version: 3.1.0
Release: 0
Summary: Python Universal Charset detector
License: MIT
URL: https://github.com/ousret/charset_normalizer
Source:
https://github.com/Ousret/charset_normalizer/archive/refs/tags/%{version}.tar.gz#/charset_normalizer-%{version}.tar.gz
+BuildRequires: %{python_module base >= 3.7}
BuildRequires: %{python_module setuptools}
BuildRequires: fdupes
BuildRequires: python-rpm-macros
++++++ charset_normalizer-3.0.1.tar.gz -> charset_normalizer-3.1.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/.github/workflows/mypyc-verify.yml
new/charset_normalizer-3.1.0/.github/workflows/mypyc-verify.yml
--- old/charset_normalizer-3.0.1/.github/workflows/mypyc-verify.yml
2022-11-18 06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/.github/workflows/mypyc-verify.yml
2023-03-06 07:46:55.000000000 +0100
@@ -9,7 +9,7 @@
strategy:
fail-fast: false
matrix:
- python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]
+ python-version: [3.7, 3.8, 3.9, "3.10", "3.11"]
os: [ubuntu-latest]
steps:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/.github/workflows/python-publish.yml
new/charset_normalizer-3.1.0/.github/workflows/python-publish.yml
--- old/charset_normalizer-3.0.1/.github/workflows/python-publish.yml
2022-11-18 06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/.github/workflows/python-publish.yml
2023-03-06 07:46:55.000000000 +0100
@@ -52,7 +52,7 @@
strategy:
fail-fast: false
matrix:
- python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10", "3.11" ]
+ python-version: [ 3.7, 3.8, 3.9, "3.10", "3.11" ]
os: [ ubuntu-latest ]
steps:
@@ -215,7 +215,7 @@
run: |
python -m pip install -U pip wheel setuptools build twine
- name: Build wheels
- uses: pypa/[email protected]
+ uses: pypa/[email protected]
env:
#CIBW_BUILD_FRONTEND: "build"
CIBW_ARCHS_MACOS: x86_64 arm64 universal2
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/.github/workflows/run-tests.yml
new/charset_normalizer-3.1.0/.github/workflows/run-tests.yml
--- old/charset_normalizer-3.0.1/.github/workflows/run-tests.yml
2022-11-18 06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/.github/workflows/run-tests.yml
2023-03-06 07:46:55.000000000 +0100
@@ -9,7 +9,7 @@
strategy:
fail-fast: false
matrix:
- python-version: [3.6, 3.7, 3.8, 3.9, "3.10", "3.11", "3.12-dev"]
+ python-version: [3.7, 3.8, 3.9, "3.10", "3.11", "3.12-dev"]
os: [ubuntu-latest]
steps:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/CHANGELOG.md
new/charset_normalizer-3.1.0/CHANGELOG.md
--- old/charset_normalizer-3.0.1/CHANGELOG.md 2022-11-18 06:44:30.000000000
+0100
+++ new/charset_normalizer-3.1.0/CHANGELOG.md 2023-03-06 07:46:55.000000000
+0100
@@ -2,6 +2,17 @@
All notable changes to charset-normalizer will be documented in this file.
This project adheres to [Semantic
Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a
Changelog](https://keepachangelog.com/en/1.0.0/).
+## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0)
(2023-03-06)
+
+### Added
+- Argument `should_rename_legacy` for legacy function `detect` and disregard
any new arguments without errors (PR #262)
+
+### Removed
+- Support for Python 3.6 (PR #260)
+
+### Changed
+- Optional speedup provided by mypy/c 1.0.1
+
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1)
(2022-11-18)
### Fixed
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/README.md
new/charset_normalizer-3.1.0/README.md
--- old/charset_normalizer-3.0.1/README.md 2022-11-18 06:44:30.000000000
+0100
+++ new/charset_normalizer-3.1.0/README.md 2023-03-06 07:46:55.000000000
+0100
@@ -23,18 +23,18 @@
This project offers you an alternative to **Universal Charset Encoding
Detector**, also known as **Chardet**.
-| Feature | [Chardet](https://github.com/chardet/chardet) |
Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
-| ------------- | :-------------: | :------------------: |
:------------------: |
-| `Fast` | â<br> | â
<br> | â
<br> |
-| `Universal**` | â | â
| â |
-| `Reliable` **without** distinguishable standards | â | â
| â
|
-| `Reliable` **with** distinguishable standards | â
| â
| â
|
-| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
-| `Native Python` | â
| â
| â |
-| `Detect spoken language` | â | â
| N/A |
-| `UnicodeDecodeError Safety` | â | â
| â |
-| `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB |
-| `Supported Encoding` | 33 | :tada:
[90](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
| 40
+| Feature |
[Chardet](https://github.com/chardet/chardet) |
Charset Normalizer |
[cChardet](https://github.com/PyYoshi/cChardet) |
+|--------------------------------------------------|:---------------------------------------------:|:------------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
+| `Fast` |
â<br> |
â
<br> |
â
<br> |
+| `Universal**` | â
| â
| â
|
+| `Reliable` **without** distinguishable standards | â
| â
| â
|
+| `Reliable` **with** distinguishable standards | â
| â
| â
|
+| `License` |
LGPL-2.1<br>_restrictive_ |
MIT |
MPL-1.1<br>_restrictive_ |
+| `Native Python` | â
| â
| â
|
+| `Detect spoken language` | â
| â
| N/A
|
+| `UnicodeDecodeError Safety` | â
| â
| â
|
+| `Whl Size` | 193.6
kB | 39.5 kB
| ~200 kB
|
+| `Supported Encoding` | 33
| :tada:
[90](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
| 40 |
<p align="center">
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text"
width="226"/><img
src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif"
alt="Cat Reading Text" width="200"/>
@@ -50,15 +50,15 @@
This package offer better performance than its counterpart Chardet. Here are
some numbers.
-| Package | Accuracy | Mean per file (ms) | File per sec (est) |
-| ------------- | :-------------: | :------------------: |
:------------------: |
-| [chardet](https://github.com/chardet/chardet) | 86 % |
200 ms | 5 file/sec |
-| charset-normalizer | **98 %** | **10 ms** | 100
file/sec |
-
-| Package | 99th percentile | 95th percentile | 50th percentile |
-| ------------- | :-------------: | :------------------: |
:------------------: |
-| [chardet](https://github.com/chardet/chardet) | 1200 ms
| 287 ms | 23 ms |
-| charset-normalizer | 100 ms | 50 ms | 5 ms |
+| Package | Accuracy | Mean per file
(ms) | File per sec (est) |
+|-----------------------------------------------|:--------:|:------------------:|:------------------:|
+| [chardet](https://github.com/chardet/chardet) | 86 % | 200 ms
| 5 file/sec |
+| charset-normalizer | **98 %** | **10 ms**
| 100 file/sec |
+
+| Package | 99th percentile | 95th
percentile | 50th percentile |
+|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
+| [chardet](https://github.com/chardet/chardet) | 1200 ms | 287 ms
| 23 ms |
+| charset-normalizer | 100 ms | 50 ms
| 5 ms |
Chardet's performance on larger file (1MB+) are very poor. Expect huge
difference on large payload.
@@ -185,15 +185,15 @@
## ð° How
- Discard all charset encoding table that could not fit the binary content.
- - Measure chaos, or the mess once opened (by chunks) with a corresponding
charset encoding.
+ - Measure noise, or the mess once opened (by chunks) with a corresponding
charset encoding.
- Extract matches with the lowest mess detected.
- Additionally, we measure coherence / probe for a language.
-**Wait a minute**, what is chaos/mess and coherence according to **YOU ?**
+**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
-*Chaos :* I opened hundred of text files, **written by humans**, with the
wrong encoding table. **I observed**, then
+*Noise :* I opened hundred of text files, **written by humans**, with the
wrong encoding table. **I observed**, then
**I established** some ground rules about **what is obvious** when **it seems
like** a mess.
- I know that my interpretation of what is chaotic is very subjective, feel
free to contribute in order to
+ I know that my interpretation of what is noise is probably incomplete, feel
free to contribute in order to
improve or rewrite it.
*Coherence :* For each language there is on earth, we have computed ranked
letter appearance occurrences (the best we can). So I thought
@@ -204,6 +204,16 @@
- Language detection is unreliable when text contains two or more languages
sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing
Latin characters))
- Every charset detector heavily depends on sufficient content. In common
cases, do not bother run detection on very tiny content.
+## â ï¸ About Python EOLs
+
+**If you are running:**
+
+- Python >=2.7,<3.5: Unsupported
+- Python 3.5: charset-normalizer < 2.1
+- Python 3.6: charset-normalizer < 3.1
+
+Upgrade your Python interpreter as soon as possible.
+
## ð¤ Contributing
Contributions, issues and feature requests are very much welcome.<br />
@@ -211,7 +221,17 @@
## ð License
-Copyright © 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
+Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
This project is
[MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE)
licensed.
Characters frequencies used in this project © 2012 [Denny
VrandeÄiÄ](http://simia.net/letters/)
+
+## ð¼ For Enterprise
+
+Professional support for charset-normalizer is available as part of the
[Tidelift
+Subscription][1]. Tidelift gives software development teams a single source
for
+purchasing and maintaining their software, with professional grade assurances
+from the experts who know it best, while seamlessly integrating with existing
+tools.
+
+[1]:
https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/bin/run_autofix.sh
new/charset_normalizer-3.1.0/bin/run_autofix.sh
--- old/charset_normalizer-3.0.1/bin/run_autofix.sh 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/bin/run_autofix.sh 2023-03-06
07:46:55.000000000 +0100
@@ -7,5 +7,5 @@
set -x
-${PREFIX}black --target-version=py36 charset_normalizer
+${PREFIX}black --target-version=py37 charset_normalizer
${PREFIX}isort charset_normalizer
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/bin/run_checks.sh
new/charset_normalizer-3.1.0/bin/run_checks.sh
--- old/charset_normalizer-3.0.1/bin/run_checks.sh 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/bin/run_checks.sh 2023-03-06
07:46:55.000000000 +0100
@@ -8,7 +8,7 @@
set -x
${PREFIX}pytest
-${PREFIX}black --check --diff --target-version=py36 charset_normalizer
+${PREFIX}black --check --diff --target-version=py37 charset_normalizer
${PREFIX}flake8 charset_normalizer
${PREFIX}mypy charset_normalizer
${PREFIX}isort --check --diff charset_normalizer
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/build-requirements.txt
new/charset_normalizer-3.1.0/build-requirements.txt
--- old/charset_normalizer-3.0.1/build-requirements.txt 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/build-requirements.txt 2023-03-06
07:46:55.000000000 +0100
@@ -1,7 +1,5 @@
# in the meantime we migrate to pyproject.toml
# this represent the minimum requirement to build (for the optional speedup)
-mypy==0.990; python_version >= "3.7"
-mypy==0.971; python_version < "3.7"
-build==0.9.0
-wheel==0.38.4; python_version >= "3.7"
-wheel==0.37.1; python_version < "3.7"
+mypy==1.0.1
+build==0.10.0
+wheel==0.38.4
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/api.py
new/charset_normalizer-3.1.0/charset_normalizer/api.py
--- old/charset_normalizer-3.0.1/charset_normalizer/api.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/api.py 2023-03-06
07:46:55.000000000 +0100
@@ -175,7 +175,6 @@
prioritized_encodings.append("utf_8")
for encoding_iana in prioritized_encodings + IANA_SUPPORTED:
-
if cp_isolation and encoding_iana not in cp_isolation:
continue
@@ -318,7 +317,9 @@
bom_or_sig_available and strip_sig_or_bom is False
):
break
- except UnicodeDecodeError as e: # Lazy str loading may have missed
something there
+ except (
+ UnicodeDecodeError
+ ) as e: # Lazy str loading may have missed something there
logger.log(
TRACE,
"LazyStr Loading: After MD chunk decode, code page %s does not
fit given bytes sequence at ALL. %s",
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/cd.py
new/charset_normalizer-3.1.0/charset_normalizer/cd.py
--- old/charset_normalizer-3.0.1/charset_normalizer/cd.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/cd.py 2023-03-06
07:46:55.000000000 +0100
@@ -140,7 +140,6 @@
source_have_accents = any(is_accentuated(character) for character in
characters)
for language, language_characters in FREQUENCIES.items():
-
target_have_accents, target_pure_latin = get_target_features(language)
if ignore_non_latin and target_pure_latin is False:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/charset_normalizer/cli/normalizer.py
new/charset_normalizer-3.1.0/charset_normalizer/cli/normalizer.py
--- old/charset_normalizer-3.0.1/charset_normalizer/cli/normalizer.py
2022-11-18 06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/cli/normalizer.py
2023-03-06 07:46:55.000000000 +0100
@@ -147,7 +147,6 @@
x_ = []
for my_file in args.files:
-
matches = from_fp(my_file, threshold=args.threshold,
explain=args.verbose)
best_guess = matches.best()
@@ -222,7 +221,6 @@
)
if args.normalize is True:
-
if best_guess.encoding.startswith("utf") is True:
print(
'"{}" file does not need to be normalized, as it
already came from unicode.'.format(
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/charset_normalizer/legacy.py
new/charset_normalizer-3.1.0/charset_normalizer/legacy.py
--- old/charset_normalizer-3.0.1/charset_normalizer/legacy.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/legacy.py 2023-03-06
07:46:55.000000000 +0100
@@ -1,10 +1,13 @@
-from typing import Dict, Optional, Union
+from typing import Any, Dict, Optional, Union
+from warnings import warn
from .api import from_bytes
from .constant import CHARDET_CORRESPONDENCE
-def detect(byte_str: bytes) -> Dict[str, Optional[Union[str, float]]]:
+def detect(
+ byte_str: bytes, should_rename_legacy: bool = False, **kwargs: Any
+) -> Dict[str, Optional[Union[str, float]]]:
"""
chardet legacy method
Detect the encoding of the given byte string. It should be mostly
backward-compatible.
@@ -13,7 +16,14 @@
further information. Not planned for removal.
:param byte_str: The byte sequence to examine.
+ :param should_rename_legacy: Should we rename legacy encodings
+ to their more modern equivalents?
"""
+ if len(kwargs):
+ warn(
+ f"charset-normalizer disregard arguments
'{','.join(list(kwargs.keys()))}' in legacy function detect()"
+ )
+
if not isinstance(byte_str, (bytearray, bytes)):
raise TypeError( # pragma: nocover
"Expected object of type bytes or bytearray, got: "
@@ -34,10 +44,11 @@
if r is not None and encoding == "utf_8" and r.bom:
encoding += "_sig"
+ if should_rename_legacy is False and encoding in CHARDET_CORRESPONDENCE:
+ encoding = CHARDET_CORRESPONDENCE[encoding]
+
return {
- "encoding": encoding
- if encoding not in CHARDET_CORRESPONDENCE
- else CHARDET_CORRESPONDENCE[encoding],
+ "encoding": encoding,
"language": language,
"confidence": confidence,
}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/charset_normalizer/utils.py
new/charset_normalizer-3.1.0/charset_normalizer/utils.py
--- old/charset_normalizer-3.0.1/charset_normalizer/utils.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/utils.py 2023-03-06
07:46:55.000000000 +0100
@@ -311,7 +311,6 @@
def cp_similarity(iana_name_a: str, iana_name_b: str) -> float:
-
if is_multi_byte_encoding(iana_name_a) or
is_multi_byte_encoding(iana_name_b):
return 0.0
@@ -351,7 +350,6 @@
level: int = logging.INFO,
format_string: str = "%(asctime)s | %(levelname)s | %(message)s",
) -> None:
-
logger = logging.getLogger(name)
logger.setLevel(level)
@@ -371,7 +369,6 @@
is_multi_byte_decoder: bool,
decoded_payload: Optional[str] = None,
) -> Generator[str, None, None]:
-
if decoded_payload and is_multi_byte_decoder is False:
for i in offsets:
chunk = decoded_payload[i : i + chunk_size]
@@ -397,7 +394,6 @@
# multi-byte bad cutting detector and adjustment
# not the cleanest way to perform that fix but clever enough for
now.
if is_multi_byte_decoder and i > 0:
-
chunk_partial_size_chk: int = min(chunk_size, 16)
if (
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-3.0.1/charset_normalizer/version.py
new/charset_normalizer-3.1.0/charset_normalizer/version.py
--- old/charset_normalizer-3.0.1/charset_normalizer/version.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/charset_normalizer/version.py 2023-03-06
07:46:55.000000000 +0100
@@ -2,5 +2,5 @@
Expose version
"""
-__version__ = "3.0.1"
+__version__ = "3.1.0"
VERSION = __version__.split(".")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/dev-requirements.txt
new/charset_normalizer-3.1.0/dev-requirements.txt
--- old/charset_normalizer-3.0.1/dev-requirements.txt 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/dev-requirements.txt 2023-03-06
07:46:55.000000000 +0100
@@ -1,26 +1,13 @@
flake8==5.0.4
-chardet==5.0.0
-isort==5.10.1
+chardet==5.1.0
+isort==5.11.4
codecov==2.1.12
pytest-cov==4.0.0
-build==0.9.0
+build==0.10.0
+wheel==0.38.4
-# The vast majority of project dropped Python 3.6
-# This is to ensure build are reproducible >=3.6
-black==22.8.0; python_version < "3.7"
-black==22.10.0; python_version >= "3.7"
-
-mypy==0.990; python_version >= "3.7"
-mypy==0.971; python_version < "3.7"
-
-Flask==2.2.2; python_version >= "3.7"
-Flask==2.0.3; python_version < "3.7"
-
-pytest==7.0.0; python_version < "3.7"
-pytest==7.2.0; python_version >= "3.7"
-
-requests==2.27.1; python_version < "3.7"
-requests==2.28.1; python_version >= "3.7"
-
-wheel==0.38.4; python_version >= "3.7"
-wheel==0.37.1; python_version < "3.7"
+black==23.1.0
+mypy==1.0.1
+Flask==2.2.3
+pytest==7.2.1
+requests==2.28.2
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/community/faq.rst
new/charset_normalizer-3.1.0/docs/community/faq.rst
--- old/charset_normalizer-3.0.1/docs/community/faq.rst 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/docs/community/faq.rst 2023-03-06
07:46:55.000000000 +0100
@@ -40,7 +40,7 @@
Then this change is mostly backward-compatible, exception of a thing:
- This new library support way more code pages (x3) than its counterpart
Chardet.
- - Based on the 30-ich charsets that Chardet support, expect roughly 85% BC
results
https://github.com/Ousret/charset_normalizer/pull/77/checks?check_run_id=3244585065
+- Based on the 30-ich charsets that Chardet support, expect roughly 80% BC
results
We do not guarantee this BC exact percentage through time. May vary but not by
much.
@@ -56,3 +56,20 @@
Any code page supported by your cPython is supported by charset-normalizer! It
is that simple, no need to update the
library. It is as generic as we could do.
+
+I can't build standalone executable
+-----------------------------------
+
+If you are using ``pyinstaller``, ``py2exe`` or alike, you may be encountering
this or close to:
+
+ ModuleNotFoundError: No module named 'charset_normalizer.md__mypyc'
+
+Why?
+
+- Your package manager picked up a optimized (for speed purposes) wheel that
match your architecture and operating system.
+- Finally, the module ``charset_normalizer.md__mypyc`` is imported via
binaries and can't be seen using your tool.
+
+How to remedy?
+
+If your bundler program support it, set up a hook that implicitly import the
hidden module.
+Otherwise, follow the guide on how to install the vanilla version of this
package. (Section: *Optional speedup extension*)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/user/cli.rst
new/charset_normalizer-3.1.0/docs/user/cli.rst
--- old/charset_normalizer-3.0.1/docs/user/cli.rst 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/docs/user/cli.rst 2023-03-06
07:46:55.000000000 +0100
@@ -5,6 +5,7 @@
This is a great tool to fully exploit the detector capabilities without having
to write Python code.
Possible use cases:
+
#. Quickly discover probable originating charset from a file.
#. I want to quickly convert a non Unicode file to Unicode.
#. Debug the charset-detector.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/docs/user/support.rst
new/charset_normalizer-3.1.0/docs/user/support.rst
--- old/charset_normalizer-3.0.1/docs/user/support.rst 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/docs/user/support.rst 2023-03-06
07:46:55.000000000 +0100
@@ -2,13 +2,21 @@
Support
=================
-Here are a list of supported encoding and supported language with latest
update. Also this list
-may change depending of your python version.
+**If you are running:**
+
+- Python >=2.7,<3.5: Unsupported
+- Python 3.5: charset-normalizer < 2.1
+- Python 3.6: charset-normalizer < 3.1
+
+Upgrade your Python interpreter as soon as possible.
-------------------
Supported Encodings
-------------------
+Here are a list of supported encoding and supported language with latest
update. Also this list
+may change depending of your python version.
+
Charset Normalizer is able to detect any of those encoding. This list is NOT
static and depends heavily on what your
current cPython version is shipped with. See
https://docs.python.org/3/library/codecs.html#standard-encodings
@@ -116,41 +124,51 @@
Those language can be detected inside your content. All of these are specified
in ./charset_normalizer/assets/__init__.py .
-English,
-German,
-French,
-Dutch,
-Italian,
-Polish,
-Spanish,
-Russian,
-Japanese,
-Portuguese,
-Swedish,
-Chinese,
-Ukrainian,
-Norwegian,
-Finnish,
-Vietnamese,
-Czech,
-Hungarian,
-Korean,
-Indonesian,
-Turkish,
-Romanian,
-Farsi,
-Arabic,
-Danish,
-Serbian,
-Lithuanian,
-Slovene,
-Slovak,
-Malay,
-Hebrew,
-Bulgarian,
-Croatian,
-Hindi,
-Estonian,
-Thai,
-Greek,
-Tamil.
+| English,
+| German,
+| French,
+| Dutch,
+| Italian,
+| Polish,
+| Spanish,
+| Russian,
+| Japanese,
+| Portuguese,
+| Swedish,
+| Chinese,
+| Ukrainian,
+| Norwegian,
+| Finnish,
+| Vietnamese,
+| Czech,
+| Hungarian,
+| Korean,
+| Indonesian,
+| Turkish,
+| Romanian,
+| Farsi,
+| Arabic,
+| Danish,
+| Serbian,
+| Lithuanian,
+| Slovene,
+| Slovak,
+| Malay,
+| Hebrew,
+| Bulgarian,
+| Croatian,
+| Hindi,
+| Estonian,
+| Thai,
+| Greek,
+| Tamil.
+
+----------------------------
+Incomplete Sequence / Stream
+----------------------------
+
+It is not (yet) officially supported. If you feed an incomplete byte sequence
(eg. truncated multi-byte sequence) the detector will
+most likely fail to return a proper result.
+If you are purposely feeding part of your payload for performance concerns,
you may stop doing it as this package is fairly optimized.
+
+We are working on a dedicated way to handle streams.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/setup.cfg
new/charset_normalizer-3.1.0/setup.cfg
--- old/charset_normalizer-3.0.1/setup.cfg 2022-11-18 06:44:30.000000000
+0100
+++ new/charset_normalizer-3.1.0/setup.cfg 2023-03-06 07:46:55.000000000
+0100
@@ -8,7 +8,7 @@
license = MIT
author_email = [email protected]
author = Ahmed TAHRI
-python_requires = >=3.6.0
+python_requires = >=3.7.0
project_urls =
Bug Reports = https://github.com/Ousret/charset_normalizer/issues
Documentation = https://charset-normalizer.readthedocs.io/en/latest
@@ -20,7 +20,6 @@
Operating System :: OS Independent
Programming Language :: Python
Programming Language :: Python :: 3
- Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-3.0.1/tests/test_logging.py
new/charset_normalizer-3.1.0/tests/test_logging.py
--- old/charset_normalizer-3.0.1/tests/test_logging.py 2022-11-18
06:44:30.000000000 +0100
+++ new/charset_normalizer-3.1.0/tests/test_logging.py 2023-03-06
07:46:55.000000000 +0100
@@ -7,7 +7,7 @@
class TestLogBehaviorClass:
- def setup(self):
+ def setup_method(self):
self.logger = logging.getLogger("charset_normalizer")
self.logger.handlers.clear()
self.logger.addHandler(logging.NullHandler())