Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-charset-normalizer for
openSUSE:Factory checked in at 2022-02-17 00:29:57
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-charset-normalizer (Old)
and /work/SRC/openSUSE:Factory/.python-charset-normalizer.new.1956 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-charset-normalizer"
Thu Feb 17 00:29:57 2022 rev:12 rq:954654 version:2.0.12
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-charset-normalizer/python-charset-normalizer.changes
2022-01-11 21:20:37.289015879 +0100
+++
/work/SRC/openSUSE:Factory/.python-charset-normalizer.new.1956/python-charset-normalizer.changes
2022-02-17 00:30:05.709437803 +0100
@@ -1,0 +2,9 @@
+Tue Feb 15 08:42:30 UTC 2022 - Dirk M??ller <[email protected]>
+
+- update to 2.0.12:
+ * ASCII miss-detection on rare cases (PR #170)
+ * Explicit support for Python 3.11 (PR #164)
+ * The logging behavior have been completely reviewed, now using only TRACE
+ and DEBUG levels
+
+-------------------------------------------------------------------
Old:
----
charset_normalizer-2.0.10.tar.gz
New:
----
charset_normalizer-2.0.12.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-charset-normalizer.spec ++++++
--- /var/tmp/diff_new_pack.UHOZtE/_old 2022-02-17 00:30:06.421437680 +0100
+++ /var/tmp/diff_new_pack.UHOZtE/_new 2022-02-17 00:30:06.425437679 +0100
@@ -19,7 +19,7 @@
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
%define skip_python2 1
Name: python-charset-normalizer
-Version: 2.0.10
+Version: 2.0.12
Release: 0
Summary: Python Universal Charset detector
License: MIT
++++++ charset_normalizer-2.0.10.tar.gz -> charset_normalizer-2.0.12.tar.gz
++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/.github/workflows/detector-coverage.yml
new/charset_normalizer-2.0.12/.github/workflows/detector-coverage.yml
--- old/charset_normalizer-2.0.10/.github/workflows/detector-coverage.yml
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/detector-coverage.yml
2022-02-12 15:24:47.000000000 +0100
@@ -31,7 +31,7 @@
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
- python ./bin/coverage.py --coverage 98 --with-preemptive
+ python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 95
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/.github/workflows/python-publish.yml
new/charset_normalizer-2.0.12/.github/workflows/python-publish.yml
--- old/charset_normalizer-2.0.10/.github/workflows/python-publish.yml
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/python-publish.yml
2022-02-12 15:24:47.000000000 +0100
@@ -101,7 +101,7 @@
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
- python ./bin/coverage.py --coverage 98 --with-preemptive
+ python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 95
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/.github/workflows/run-tests.yml
new/charset_normalizer-2.0.12/.github/workflows/run-tests.yml
--- old/charset_normalizer-2.0.10/.github/workflows/run-tests.yml
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/.github/workflows/run-tests.yml
2022-02-12 15:24:47.000000000 +0100
@@ -9,7 +9,7 @@
strategy:
fail-fast: false
matrix:
- python-version: [3.5, 3.6, 3.7, 3.8, 3.9, "3.10"]
+ python-version: [3.5, 3.6, 3.7, 3.8, 3.9, "3.10", "3.11.0-alpha.4"]
os: [ubuntu-latest]
steps:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/CHANGELOG.md
new/charset_normalizer-2.0.12/CHANGELOG.md
--- old/charset_normalizer-2.0.10/CHANGELOG.md 2022-01-04 21:14:06.000000000
+0100
+++ new/charset_normalizer-2.0.12/CHANGELOG.md 2022-02-12 15:24:47.000000000
+0100
@@ -2,6 +2,19 @@
All notable changes to charset-normalizer will be documented in this file.
This project adheres to [Semantic
Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a
Changelog](https://keepachangelog.com/en/1.0.0/).
+##
[2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12)
(2022-02-12)
+
+### Fixed
+- ASCII miss-detection on rare cases (PR #170)
+
+##
[2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11)
(2022-01-30)
+
+### Added
+- Explicit support for Python 3.11 (PR #164)
+
+### Changed
+- The logging behavior have been completely reviewed, now using only TRACE and
DEBUG levels (PR #163 #165)
+
##
[2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10)
(2022-01-04)
### Fixed
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/README.md
new/charset_normalizer-2.0.12/README.md
--- old/charset_normalizer-2.0.10/README.md 2022-01-04 21:14:06.000000000
+0100
+++ new/charset_normalizer-2.0.12/README.md 2022-02-12 15:24:47.000000000
+0100
@@ -33,12 +33,13 @@
| `License` | LGPL-2.1 | MIT | MPL-1.1
| `Native Python` | :heavy_check_mark: | :heavy_check_mark: | ??? |
| `Detect spoken language` | ??? | :heavy_check_mark: | N/A |
-| `Supported Encoding` | 30 | :tada:
[93](https://charset-normalizer.readthedocs.io/en/latest/support.html) | 40
+| `Supported Encoding` | 30 | :tada:
[93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings)
| 40
<p align="center">
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text"
width="226"/><img
src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif"
alt="Cat Reading Text" width="200"/>
*\*\* : They are clearly using specific code for a specific encoding even if
covering most of used one*<br>
+Did you got there because of the logs? See
[https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html](https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html)
## ??? Your support
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/bc.py
new/charset_normalizer-2.0.12/bin/bc.py
--- old/charset_normalizer-2.0.10/bin/bc.py 2022-01-04 21:14:06.000000000
+0100
+++ new/charset_normalizer-2.0.12/bin/bc.py 2022-02-12 15:24:47.000000000
+0100
@@ -43,7 +43,7 @@
success_count = 0
total_count = 0
- for tbt_path in glob("./char-dataset/**/*.*"):
+ for tbt_path in sorted(glob("./char-dataset/**/*.*")):
total_count += 1
with open(tbt_path, "rb") as fp:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/coverage.py
new/charset_normalizer-2.0.12/bin/coverage.py
--- old/charset_normalizer-2.0.10/bin/coverage.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/bin/coverage.py 2022-02-12
15:24:47.000000000 +0100
@@ -43,7 +43,7 @@
success_count = 0
total_count = 0
- for tbt_path in glob("./char-dataset/**/*.*"):
+ for tbt_path in sorted(glob("./char-dataset/**/*.*")):
expected_encoding = tbt_path.split(sep)[-2]
total_count += 1
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/performance.py
new/charset_normalizer-2.0.12/bin/performance.py
--- old/charset_normalizer-2.0.10/bin/performance.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/bin/performance.py 2022-02-12
15:24:47.000000000 +0100
@@ -37,7 +37,7 @@
chardet_results = []
charset_normalizer_results = []
- for tbt_path in glob("./char-dataset/**/*.*"):
+ for tbt_path in sorted(glob("./char-dataset/**/*.*")):
print(tbt_path)
# Read Bin file
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/bin/serve.py
new/charset_normalizer-2.0.12/bin/serve.py
--- old/charset_normalizer-2.0.10/bin/serve.py 2022-01-04 21:14:06.000000000
+0100
+++ new/charset_normalizer-2.0.12/bin/serve.py 2022-02-12 15:24:47.000000000
+0100
@@ -13,7 +13,7 @@
def read_targets():
return jsonify(
[
- el.replace("./char-dataset", "/raw").replace("\\", "/") for el in
glob("./char-dataset/**/*")
+ el.replace("./char-dataset", "/raw").replace("\\", "/") for el in
sorted(glob("./char-dataset/**/*"))
]
)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/charset_normalizer/api.py
new/charset_normalizer-2.0.12/charset_normalizer/api.py
--- old/charset_normalizer-2.0.10/charset_normalizer/api.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/api.py 2022-02-12
15:24:47.000000000 +0100
@@ -13,7 +13,7 @@
mb_encoding_languages,
merge_coherence_ratios,
)
-from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE
+from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE,
TRACE
from .md import mess_ratio
from .models import CharsetMatch, CharsetMatches
from .utils import (
@@ -25,6 +25,8 @@
should_strip_sig_or_bom,
)
+# Will most likely be controversial
+# logging.addLevelName(TRACE, "TRACE")
logger = logging.getLogger("charset_normalizer")
explain_handler = logging.StreamHandler()
explain_handler.setFormatter(
@@ -70,19 +72,20 @@
if explain:
previous_logger_level = logger.level # type: int
logger.addHandler(explain_handler)
- logger.setLevel(logging.DEBUG)
+ logger.setLevel(TRACE)
length = len(sequences) # type: int
if length == 0:
- logger.warning("Encoding detection on empty bytes, assuming utf_8
intention.")
+ logger.debug("Encoding detection on empty bytes, assuming utf_8
intention.")
if explain:
logger.removeHandler(explain_handler)
logger.setLevel(previous_logger_level or logging.WARNING)
return CharsetMatches([CharsetMatch(sequences, "utf_8", 0.0, False,
[], "")])
if cp_isolation is not None:
- logger.debug(
+ logger.log(
+ TRACE,
"cp_isolation is set. use this flag for debugging purpose. "
"limited list of encoding allowed : %s.",
", ".join(cp_isolation),
@@ -92,7 +95,8 @@
cp_isolation = []
if cp_exclusion is not None:
- logger.debug(
+ logger.log(
+ TRACE,
"cp_exclusion is set. use this flag for debugging purpose. "
"limited list of encoding excluded : %s.",
", ".join(cp_exclusion),
@@ -102,7 +106,8 @@
cp_exclusion = []
if length <= (chunk_size * steps):
- logger.debug(
+ logger.log(
+ TRACE,
"override steps (%i) and chunk_size (%i) as content does not fit
(%i byte(s) given) parameters.",
steps,
chunk_size,
@@ -118,16 +123,18 @@
is_too_large_sequence = len(sequences) >= TOO_BIG_SEQUENCE # type: bool
if is_too_small_sequence:
- logger.warning(
+ logger.log(
+ TRACE,
"Trying to detect encoding from a tiny portion of ({})
byte(s).".format(
length
- )
+ ),
)
elif is_too_large_sequence:
- logger.info(
+ logger.log(
+ TRACE,
"Using lazy str decoding because the payload is quite large, ({})
byte(s).".format(
length
- )
+ ),
)
prioritized_encodings = [] # type: List[str]
@@ -138,7 +145,8 @@
if specified_encoding is not None:
prioritized_encodings.append(specified_encoding)
- logger.info(
+ logger.log(
+ TRACE,
"Detected declarative mark in sequence. Priority +1 given for %s.",
specified_encoding,
)
@@ -157,7 +165,8 @@
if sig_encoding is not None:
prioritized_encodings.append(sig_encoding)
- logger.info(
+ logger.log(
+ TRACE,
"Detected a SIG or BOM mark on first %i byte(s). Priority +1 given
for %s.",
len(sig_payload),
sig_encoding,
@@ -188,7 +197,8 @@
) # type: bool
if encoding_iana in {"utf_16", "utf_32"} and not bom_or_sig_available:
- logger.debug(
+ logger.log(
+ TRACE,
"Encoding %s wont be tested as-is because it require a BOM.
Will try some sub-encoder LE/BE.",
encoding_iana,
)
@@ -197,8 +207,10 @@
try:
is_multi_byte_decoder = is_multi_byte_encoding(encoding_iana) #
type: bool
except (ModuleNotFoundError, ImportError):
- logger.debug(
- "Encoding %s does not provide an IncrementalDecoder",
encoding_iana
+ logger.log(
+ TRACE,
+ "Encoding %s does not provide an IncrementalDecoder",
+ encoding_iana,
)
continue
@@ -219,7 +231,8 @@
)
except (UnicodeDecodeError, LookupError) as e:
if not isinstance(e, LookupError):
- logger.debug(
+ logger.log(
+ TRACE,
"Code page %s does not fit given bytes sequence at ALL.
%s",
encoding_iana,
str(e),
@@ -235,7 +248,8 @@
break
if similar_soft_failure_test:
- logger.debug(
+ logger.log(
+ TRACE,
"%s is deemed too similar to code page %s and was consider
unsuited already. Continuing!",
encoding_iana,
encoding_soft_failed,
@@ -255,7 +269,8 @@
) # type: bool
if multi_byte_bonus:
- logger.debug(
+ logger.log(
+ TRACE,
"Code page %s is a multi byte encoding table and it appear
that at least one character "
"was encoded using n-bytes.",
encoding_iana,
@@ -285,7 +300,8 @@
errors="ignore" if is_multi_byte_decoder else "strict",
) # type: str
except UnicodeDecodeError as e: # Lazy str loading may have
missed something there
- logger.debug(
+ logger.log(
+ TRACE,
"LazyStr Loading: After MD chunk decode, code page %s does
not fit given bytes sequence at ALL. %s",
encoding_iana,
str(e),
@@ -337,7 +353,8 @@
try:
sequences[int(50e3) :].decode(encoding_iana, errors="strict")
except UnicodeDecodeError as e:
- logger.debug(
+ logger.log(
+ TRACE,
"LazyStr Loading: After final lookup, code page %s does
not fit given bytes sequence at ALL. %s",
encoding_iana,
str(e),
@@ -350,7 +367,8 @@
) # type: float
if mean_mess_ratio >= threshold or early_stop_count >=
max_chunk_gave_up:
tested_but_soft_failure.append(encoding_iana)
- logger.info(
+ logger.log(
+ TRACE,
"%s was excluded because of initial chaos probing. Gave up %i
time(s). "
"Computed mean chaos is %f %%.",
encoding_iana,
@@ -373,7 +391,8 @@
fallback_u8 = fallback_entry
continue
- logger.info(
+ logger.log(
+ TRACE,
"%s passed initial chaos probing. Mean measured chaos is %f %%",
encoding_iana,
round(mean_mess_ratio * 100, ndigits=3),
@@ -385,10 +404,11 @@
target_languages = mb_encoding_languages(encoding_iana)
if target_languages:
- logger.debug(
+ logger.log(
+ TRACE,
"{} should target any language(s) of {}".format(
encoding_iana, str(target_languages)
- )
+ ),
)
cd_ratios = []
@@ -406,10 +426,11 @@
cd_ratios_merged = merge_coherence_ratios(cd_ratios)
if cd_ratios_merged:
- logger.info(
+ logger.log(
+ TRACE,
"We detected language {} using {}".format(
cd_ratios_merged, encoding_iana
- )
+ ),
)
results.append(
@@ -427,8 +448,8 @@
encoding_iana in [specified_encoding, "ascii", "utf_8"]
and mean_mess_ratio < 0.1
):
- logger.info(
- "%s is most likely the one. Stopping the process.",
encoding_iana
+ logger.debug(
+ "Encoding detection: %s is most likely the one.", encoding_iana
)
if explain:
logger.removeHandler(explain_handler)
@@ -436,8 +457,9 @@
return CharsetMatches([results[encoding_iana]])
if encoding_iana == sig_encoding:
- logger.info(
- "%s is most likely the one as we detected a BOM or SIG within
the beginning of the sequence.",
+ logger.debug(
+ "Encoding detection: %s is most likely the one as we detected
a BOM or SIG within "
+ "the beginning of the sequence.",
encoding_iana,
)
if explain:
@@ -447,13 +469,15 @@
if len(results) == 0:
if fallback_u8 or fallback_ascii or fallback_specified:
- logger.debug(
- "Nothing got out of the detection process. Using
ASCII/UTF-8/Specified fallback."
+ logger.log(
+ TRACE,
+ "Nothing got out of the detection process. Using
ASCII/UTF-8/Specified fallback.",
)
if fallback_specified:
logger.debug(
- "%s will be used as a fallback match",
fallback_specified.encoding
+ "Encoding detection: %s will be used as a fallback match",
+ fallback_specified.encoding,
)
results.append(fallback_specified)
elif (
@@ -465,12 +489,21 @@
)
or (fallback_u8 is not None)
):
- logger.warning("utf_8 will be used as a fallback match")
+ logger.debug("Encoding detection: utf_8 will be used as a fallback
match")
results.append(fallback_u8)
elif fallback_ascii:
- logger.warning("ascii will be used as a fallback match")
+ logger.debug("Encoding detection: ascii will be used as a fallback
match")
results.append(fallback_ascii)
+ if results:
+ logger.debug(
+ "Encoding detection: Found %s as plausible (best-candidate) for
content. With %i alternatives.",
+ results.best().encoding, # type: ignore
+ len(results) - 1,
+ )
+ else:
+ logger.debug("Encoding detection: Unable to determine any suitable
charset.")
+
if explain:
logger.removeHandler(explain_handler)
logger.setLevel(previous_logger_level)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/charset_normalizer/constant.py
new/charset_normalizer-2.0.12/charset_normalizer/constant.py
--- old/charset_normalizer-2.0.10/charset_normalizer/constant.py
2022-01-04 21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/constant.py
2022-02-12 15:24:47.000000000 +0100
@@ -498,3 +498,6 @@
NOT_PRINTABLE_PATTERN = re_compile(r"[0-9\W\n\r\t]+")
LANGUAGE_SUPPORTED_COUNT = len(FREQUENCIES) # type: int
+
+# Logging LEVEL bellow DEBUG
+TRACE = 5 # type: int
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/charset_normalizer/md.py
new/charset_normalizer-2.0.12/charset_normalizer/md.py
--- old/charset_normalizer-2.0.10/charset_normalizer/md.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/md.py 2022-02-12
15:24:47.000000000 +0100
@@ -314,7 +314,7 @@
self._buffer = ""
self._buffer_accent_count = 0
elif (
- character not in {"<", ">", "-", "="}
+ character not in {"<", ">", "-", "=", "~", "|", "_"}
and character.isdigit() is False
and is_symbol(character)
):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/charset_normalizer/version.py
new/charset_normalizer-2.0.12/charset_normalizer/version.py
--- old/charset_normalizer-2.0.10/charset_normalizer/version.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/charset_normalizer/version.py 2022-02-12
15:24:47.000000000 +0100
@@ -2,5 +2,5 @@
Expose version
"""
-__version__ = "2.0.10"
+__version__ = "2.0.12"
VERSION = __version__.split(".")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/dev-requirements.txt
new/charset_normalizer-2.0.12/dev-requirements.txt
--- old/charset_normalizer-2.0.10/dev-requirements.txt 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/dev-requirements.txt 2022-02-12
15:24:47.000000000 +0100
@@ -4,7 +4,7 @@
chardet==4.0.*
Flask>=2.0,<3.0; python_version >= '3.6'
requests>=2.26,<3.0; python_version >= '3.6'
-black==21.12b0; python_version >= '3.6'
+black==22.1.0; python_version >= '3.6'
flake8==4.0.1; python_version >= '3.6'
-mypy==0.930; python_version >= '3.6'
+mypy==0.931; python_version >= '3.6'
isort; python_version >= '3.6'
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/charset_normalizer-2.0.10/docs/user/miscellaneous.rst
new/charset_normalizer-2.0.12/docs/user/miscellaneous.rst
--- old/charset_normalizer-2.0.10/docs/user/miscellaneous.rst 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/docs/user/miscellaneous.rst 2022-02-12
15:24:47.000000000 +0100
@@ -18,3 +18,29 @@
# This should print '????????????????????????????????????????????????'
print(str(result))
+
+
+Logging
+-------
+
+Prior to the version 2.0.10 you may encounter some unexpected logs in your
streams.
+Something along the line of:
+
+ ::
+
+ ... | WARNING | override steps (5) and chunk_size (512) as content does
not fit (465 byte(s) given) parameters.
+ ... | INFO | ascii passed initial chaos probing. Mean measured chaos is
0.000000 %
+ ... | INFO | ascii should target any language(s) of ['Latin Based']
+
+
+It is most likely because you altered the root getLogger instance. The package
has its own logic behind logging and why
+it is useful. See https://docs.python.org/3/howto/logging.html to learn the
basics.
+
+If you are looking to silence and/or reduce drastically the amount of logs,
please upgrade to the latest version
+available for `charset-normalizer` using your package manager or by `pip
install charset-normalizer -U`.
+
+The latest version will no longer produce any entry greater than `DEBUG`.
+On `DEBUG` only one entry will be observed and that is about the detection
result.
+
+Then regarding the others log entries, they will be pushed as `Level 5`.
Commonly known as TRACE level, but we do
+not register it globally.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/setup.py
new/charset_normalizer-2.0.12/setup.py
--- old/charset_normalizer-2.0.10/setup.py 2022-01-04 21:14:06.000000000
+0100
+++ new/charset_normalizer-2.0.12/setup.py 2022-02-12 15:24:47.000000000
+0100
@@ -73,6 +73,7 @@
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
+ 'Programming Language :: Python :: 3.11',
'Topic :: Text Processing :: Linguistic',
'Topic :: Utilities',
'Programming Language :: Python :: Implementation :: PyPy',
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/charset_normalizer-2.0.10/tests/test_logging.py
new/charset_normalizer-2.0.12/tests/test_logging.py
--- old/charset_normalizer-2.0.10/tests/test_logging.py 2022-01-04
21:14:06.000000000 +0100
+++ new/charset_normalizer-2.0.12/tests/test_logging.py 2022-02-12
15:24:47.000000000 +0100
@@ -3,6 +3,7 @@
from charset_normalizer.utils import set_logging_handler
from charset_normalizer.api import from_bytes, explain_handler
+from charset_normalizer.constant import TRACE
class TestLogBehaviorClass:
@@ -17,16 +18,16 @@
from_bytes(test_sequence, steps=1, chunk_size=50, explain=True)
assert explain_handler not in self.logger.handlers
for record in caplog.records:
- assert record.levelname in ["INFO", "DEBUG"]
+ assert record.levelname in ["Level 5", "DEBUG"]
def test_explain_false_handler_set_behavior(self, caplog):
test_sequence = b'This is a test sequence of bytes that should be
sufficient'
- set_logging_handler(level=logging.INFO, format_string="%(message)s")
+ set_logging_handler(level=TRACE, format_string="%(message)s")
from_bytes(test_sequence, steps=1, chunk_size=50, explain=False)
assert any(isinstance(hdl, logging.StreamHandler) for hdl in
self.logger.handlers)
for record in caplog.records:
- assert record.levelname in ["INFO", "DEBUG"]
- assert "ascii is most likely the one. Stopping the process." in
caplog.text
+ assert record.levelname in ["Level 5", "DEBUG"]
+ assert "Encoding detection: ascii is most likely the one." in
caplog.text
def test_set_stream_handler(self, caplog):
set_logging_handler(
@@ -34,7 +35,7 @@
)
self.logger.debug("log content should log with default format")
for record in caplog.records:
- assert record.levelname in ["INFO", "DEBUG"]
+ assert record.levelname in ["Level 5", "DEBUG"]
assert "log content should log with default format" in caplog.text
def test_set_stream_handler_format(self, caplog):