commit python-chardet for openSUSE:Factory

Source-Sync Sat, 14 Mar 2026 14:20:37 -0700

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-chardet for openSUSE:Factory 
checked in at 2026-03-14 22:20:12
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-chardet (Old)
 and      /work/SRC/openSUSE:Factory/.python-chardet.new.8177 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-chardet"

Sat Mar 14 22:20:12 2026 rev:36 rq:1337270 version:6.0.0

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-chardet/python-chardet.changes    
2023-09-06 18:56:57.880930889 +0200
+++ /work/SRC/openSUSE:Factory/.python-chardet.new.8177/python-chardet.changes  
2026-03-14 22:20:13.791751842 +0100
@@ -1,0 +2,131 @@
+Fri Mar  6 07:41:56 UTC 2026 - Matej Cepl <[email protected]>
+
+- update to 6.0.0 (the last version before the infringement;
+  DON’T UPGRADE UNTIL gh#chardet/chardet#327 IS RESOLVED):
+  - Features
+    - Unified single-byte charset detection: Instead of only
+      having trained language models for a handful of languages
+      (Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai,
+      Turkish) and relying on special-case Latin1Prober and
+      MacRomanProber heuristics for Western encodings, chardet
+      now treats all single-byte charsets the same way: every
+      encoding gets proper language-specific bigram models
+      trained on CulturaX corpus data. This means chardet can now
+      accurately detect both the encoding and the language for
+      all supported single-byte encodings.
+    - 38 new languages: Arabic, Belarusian, Breton, Croatian,
+      Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi,
+      Finnish, French, German, Icelandic, Indonesian, Irish,
+      Italian, Kazakh, Latvian, Lithuanian, Macedonian, Malay,
+      Maltese, Norwegian, Polish, Portuguese, Romanian, Scottish
+      Gaelic, Serbian, Slovak, Slovene, Spanish, Swedish, Tajik,
+      Ukrainian, Vietnamese, and Welsh. Existing models for
+      Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, and
+      Turkish were also retrained with the new pipeline.
+    - EncodingEra filtering: New encoding_era parameter to detect
+      allows filtering by an EncodingEra flag enum (MODERN_WEB,
+      LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME,
+      ALL) allows callers to restrict detection to encodings from
+      a specific era. detect() and detect_all() default to
+      MODERN_WEB. The new MODERN_WEB default should drastically
+      improve accuracy for users who are not working with legacy
+      data. The tiers are:
+          MODERN_WEB: UTF-8/16/32, Windows-125x, CP874, CJK
+          multi-byte (widely used on the web)
+          LEGACY_ISO: ISO-8859-x, KOI8-R/U (legacy but well-known
+          standards)
+          LEGACY_MAC: Mac-specific encodings (MacRoman,
+          MacCyrillic, etc.)
+          LEGACY_REGIONAL: Uncommon regional/national encodings
+          (KOI8-T, KZ1048, CP1006, etc.)
+          DOS: DOS/OEM code pages (CP437, CP850, CP866, etc.)
+          MAINFRAME: EBCDIC variants (CP037, CP500, etc.)
+    - --encoding-era CLI flag: The chardetect CLI now accepts
+      -e/--encoding-era to control which encoding eras are
+      considered during detection.
+    - max_bytes and chunk_size parameters: detect(),
+      detect_all(), and UniversalDetector now accept max_bytes
+      (default 200KB) and chunk_size (default 64KB) parameters
+      for controlling how much data is examined. (#314, @bysiber)
+    - Encoding era preference tie-breaking: When multiple
+      encodings have very close confidence scores, the detector
+      now prefers more modern/Unicode encodings over legacy ones.
+    - Charset metadata registry: New chardet.metadata.charsets
+      module provides structured metadata about all supported
+      encodings, including their era classification and language
+      filter.
+    - should_rename_legacy now defaults intelligently: When set
+      to None (the new default), legacy renaming is automatically
+      enabled when encoding_era is MODERN_WEB.
+    - Direct GB18030 support: Replaced the redundant GB2312
+      prober with a proper GB18030 prober.
+    - EBCDIC detection: Added CP037 and CP500 EBCDIC model
+      registrations for mainframe encoding detection.
+    - Binary file detection: Added basic binary file detection to
+      abort analysis earlier on non-text files.
+    - Python 3.12, 3.13, and 3.14 support (#283, @hugovk; #311)
+    - GitHub Codespace support (#312, @oxygen-dioxide)
+  - Fixes
+    - Fix CP949 state machine: Corrected the state machine for
+      Korean CP949 encoding detection. (#268, @nenw)
+    - Fix SJIS distribution analysis: Fixed
+      SJISDistributionAnalysis discarding valid second-byte range
+      >= 0x80. (#315, @bysiber)
+    - Fix UTF-16/32 detection for non-ASCII-heavy text: Improved
+      detection of UTF-16/32 encoded CJK and other non-ASCII text
+      by adding a MIN_RATIO threshold alongside the existing
+      EXPECTED_RATIO.
+    - Fix get_charset crash: Resolved a crash when looking up
+      unknown charset names.
+    - Fix GB18030 char_len_table: Corrected the character length
+      table for GB18030 multi-byte sequences.
+    - Fix UTF-8 state machine: Updated to be more spec-compliant.
+    - Fix detect_all() returning inactive probers: Results from
+      probers that determined "definitely not this encoding" are
+      now excluded.
+    - Fix early cutoff bug: Resolved an issue where detection
+      could terminate prematurely.
+    - Default UTF-8 fallback: If UTF-8 has not been ruled out and
+      nothing else is above the minimum threshold, UTF-8 is now
+      returned as the default.
+  - Breaking changes
+    - Dropped Python 3.7, 3.8, and 3.9 support: Now requires
+      Python 3.10+. (#283, @hugovk)
+    - Removed Latin1Prober and MacRomanProber: These special-case
+      probers have been replaced by the unified model-based
+      approach described above. Latin-1, MacRoman, and all other
+      single-byte encodings are now detected by
+      SingleByteCharSetProber with trained language models,
+      giving better accuracy and language identification.
+    - Removed EUC-TW support: EUC-TW encoding detection has been
+      removed as it is extremely rare in practice.
+    - LanguageFilter.NONE removed: Use specific language filters
+      or LanguageFilter.ALL instead.
+    - Enum types changed: InputState, ProbingState, MachineState,
+      SequenceLikelihood, and CharacterCategory are now IntEnum
+      (previously plain classes or Enum). LanguageFilter values
+      changed from hardcoded hex to auto().
+    - detect() default behavior change: detect() now defaults to
+      encoding_era=EncodingEra.MODERN_WEB and
+      should_rename_legacy=None (auto-enabled for MODERN_WEB),
+      whereas previously it defaulted to considering all
+      encodings with no legacy renaming.
+  - Misc changes
+    - Switched from Poetry/setuptools to uv + hatchling: Build
+      system modernized with hatch-vcs for version management.
+    - License text updated: Updated LGPLv2.1 license text and FSF
+      notices to use URL instead of mailing address. (#304, #307,
+      @musicinmybrain)
+    - CulturaX-based model training: The create_language_model.py
+      training script was rewritten to use the CulturaX
+      multilingual corpus instead of Wikipedia, producing higher
+      quality bigram frequency models.
+    - Language class converted to frozen dataclass: The language
+      metadata class now uses @dataclass(frozen=True) with
+      num_training_docs and num_training_chars fields replacing
+      wiki_start_pages.
+    - Test infrastructure: Added pytest-timeout and pytest-xdist
+      for faster parallel test execution. Reorganized test data
+      directories.
+
+-------------------------------------------------------------------

Old:
----
  chardet-5.2.0.tar.gz

New:
----
  chardet-6.0.0.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-chardet.spec ++++++
--- /var/tmp/diff_new_pack.W16lil/_old  2026-03-14 22:20:14.691789101 +0100
+++ /var/tmp/diff_new_pack.W16lil/_new  2026-03-14 22:20:14.691789101 +0100
@@ -1,7 +1,7 @@
 #
-# spec file
+# spec file for package python-chardet
 #
-# Copyright (c) 2023 SUSE LLC
+# Copyright (c) 2026 SUSE LLC and contributors
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -16,12 +16,6 @@
 #
 
 
-%if 0%{?suse_version} > 1500
-%bcond_without libalternatives
-%else
-%bcond_with libalternatives
-%endif
-
 %global flavor @BUILD_FLAVOR@%{nil}
 %if "%{flavor}" == "test"
 %define psuffix -test
@@ -30,34 +24,38 @@
 %define psuffix %{nil}
 %bcond_with test
 %endif
-
-%{?!python_module:%define python_module() python3-%{**}}
-%define skip_python2 1
 %define skip_python36 1
+%if 0%{?suse_version} > 1500
+%bcond_without libalternatives
+%else
+%bcond_with libalternatives
+%endif
 %{?sle15_python_module_pythons}
 Name:           python-chardet%{psuffix}
-Version:        5.2.0
+Version:        6.0.0
 Release:        0
 Summary:        Universal encoding detector
 License:        LGPL-2.1-or-later
 URL:            https://github.com/chardet/chardet
 Source0:        
https://files.pythonhosted.org/packages/source/c/chardet/chardet-%{version}.tar.gz
 BuildRequires:  %{python_module base >= 3.7}
+BuildRequires:  %{python_module hatchling}
 BuildRequires:  %{python_module pip}
-BuildRequires:  %{python_module setuptools}
 BuildRequires:  %{python_module wheel}
 BuildRequires:  fdupes
 BuildRequires:  python-rpm-macros >= 20210929
+BuildArch:      noarch
 %if %{with libalternatives}
 BuildRequires:  alts
 Requires:       alts
 %else
 Requires(post): update-alternatives
-Requires(postun):update-alternatives
+Requires(postun): update-alternatives
 %endif
-BuildArch:      noarch
 %if %{with test}
-BuildRequires:  %{python_module hypothesis}
+BuildRequires:  %{python_module hypothesis >= 6.0.0}
+BuildRequires:  %{python_module pytest-timeout}
+BuildRequires:  %{python_module pytest-xdist}
 BuildRequires:  %{python_module pytest}
 %endif
 %python_subpackages

++++++ chardet-5.2.0.tar.gz -> chardet-6.0.0.tar.gz ++++++
/work/SRC/openSUSE:Factory/python-chardet/chardet-5.2.0.tar.gz 
/work/SRC/openSUSE:Factory/.python-chardet.new.8177/chardet-6.0.0.tar.gz 
differ: char 4, line 1

commit python-chardet for openSUSE:Factory

Reply via email to