This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push: new b0d38cdd refactor(python): replace Black and Flake8 with Ruff (#2388) b0d38cdd is described below commit b0d38cdd7837ed64aa09cbf4693f2725b71fa753 Author: Emre Şafak <3928300+esa...@users.noreply.github.com> AuthorDate: Sun Jul 6 04:55:10 2025 -0400 refactor(python): replace Black and Flake8 with Ruff (#2388) Contribution Checklist ## Why? I was running the existing linting script on my last PR and it was failing because it was incompatible with the new version of `importlib-metadata` I had installed. First I considered unpinning or updating the versions of the linting libraries, then I realized it would be better to simply get rid of them in favor of [ruff](https://astral.sh/ruff), a modern Rust-based linter and formatter which aims to replace flake8 and black. Note that this PR includes a few ruff reformatted python files. ## What does this PR do? - Removed Black and Flake8 configurations and dependencies. - Integrated Ruff for both formatting and linting, updating scripts and configuration files accordingly. - Adjusted documentation and examples in `README.md` and `CONTRIBUTING.md` to reflect the switch to Ruff. - Improved simplicity and consistency in formatting validation steps. - Migrates linter configuration into `pyproject.toml` ## Does this PR introduce any user-facing change? No ## Benchmark Ruff is faster :) ## Notes The following [pycodestyle ignores](https://pycodestyle.pycqa.org/en/latest/intro.html#error-codes) are dropped support for: - E121: continuation line under-indented for hanging indent - E123: closing bracket does not match indentation of opening bracket’s line - E126: continuation line with same indent as next logical line - E704: multiple statements on one line (def) - W503: line break before binary operator - W504: line break after binary operator And [this bugbear warning](https://github.com/PyCQA/flake8-bugbear?tab=readme-ov-file#list-of-warnings): - B001: Do not use bare except:, it also catches unexpected events like memory errors, interrupts, system exit, and so on. Prefer except --------- Co-authored-by: Emre Şafak <esa...@users.noreply.github.com> --- .flake8 | 58 ---------------- CONTRIBUTING.md | 6 +- ci/format.sh | 97 ++++++--------------------- ci/run_ci.sh | 2 +- python/README.md | 8 +-- python/pyfory/format/infer.py | 16 ++--- python/pyfory/format/tests/test_infer.py | 12 +--- python/pyfory/format/tests/test_vectorized.py | 4 +- python/pyfory/meta/metastring.py | 45 +++---------- python/pyproject.toml | 28 ++++++++ 10 files changed, 73 insertions(+), 203 deletions(-) diff --git a/.flake8 b/.flake8 deleted file mode 100644 index 9b8b4314..00000000 --- a/.flake8 +++ /dev/null @@ -1,58 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -[flake8] -exclude = - python/build/ - python/.eggs/ -max-line-length = 150 -inline-quotes = " -ignore = - C408 - E121 - E123 - E126 - E203 - E225 - E226 - E227 - E402 - E999 - E24 - E704 - W503 - W504 - W605 - I - N - B001 - B002 - B003 - B004 - B005 - B007 - B008 - B009 - B010 - B011 - B012 - B013 - B014 - B015 - B016 - B017 -avoid-escape = no diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4126e8d3..9ad69bbe 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -96,10 +96,10 @@ mvn checkstyle:check ```bash cd python -# install dependencies fro styling -pip install black==22.1.0 flake8==3.9.1 flake8-quotes flake8-bugbear click==8.0.2 +# install dependencies for formatting +pip install ruff # format python code -black pyfory +ruff format python ``` ### C++ diff --git a/ci/format.sh b/ci/format.sh index b66652bd..56fe26aa 100755 --- a/ci/format.sh +++ b/ci/format.sh @@ -2,14 +2,12 @@ # This script is derived from https://github.com/ray-project/ray/blob/5ce25a57a0949673d17f3a8784f05b2d65290524/ci/lint/format.sh. -# Black + Clang formatter (if installed). This script formats all changed files from the last mergebase. +# Ruff formatter (if installed). This script formats all changed files from the last mergebase. # You are encouraged to run this locally before pushing changes for review. # Cause the script to exit if a single command fails set -euox pipefail -FLAKE8_VERSION_REQUIRED="3.9.1" -BLACK_VERSION_REQUIRED="22.1.0" SHELLCHECK_VERSION_REQUIRED="0.7.1" install_nodejs() { @@ -26,27 +24,11 @@ install_nodejs() { npm -v } -check_python_command_exist() { - VERSION="" - case "$1" in - black) - VERSION=$BLACK_VERSION_REQUIRED - ;; - flake8) - VERSION=$FLAKE8_VERSION_REQUIRED - ;; - *) - echo "$1 is not a required dependency" - exit 1 - esac - if ! [ -x "$(command -v "$1")" ]; then - echo "$1 not installed. Install the python package with: pip install $1==$VERSION" - exit 1 - fi -} - -check_python_command_exist black -check_python_command_exist flake8 +# Check for ruff +if ! [ -x "$(command -v ruff)" ]; then + echo "ruff not installed. Install with: pip install ruff" + exit 1 +fi # this stops git rev-parse from failing if we run this from the .git directory builtin cd "$(dirname "${BASH_SOURCE:-$0}")" @@ -54,9 +36,6 @@ builtin cd "$(dirname "${BASH_SOURCE:-$0}")" ROOT="$(git rev-parse --show-toplevel)" builtin cd "$ROOT" || exit 1 -FLAKE8_VERSION=$(flake8 --version | head -n 1 | awk '{print $1}') -BLACK_VERSION=$(black --version | awk '{print $2}') - # params: tool name, tool version, required version tool_version_check() { if [ "$2" != "$3" ]; then @@ -64,9 +43,6 @@ tool_version_check() { fi } -tool_version_check "flake8" "$FLAKE8_VERSION" "$FLAKE8_VERSION_REQUIRED" -tool_version_check "black" "$BLACK_VERSION" "$BLACK_VERSION_REQUIRED" - if command -v shellcheck >/dev/null; then SHELLCHECK_VERSION=$(shellcheck --version | awk '/^version:/ {print $2}') tool_version_check "shellcheck" "$SHELLCHECK_VERSION" "$SHELLCHECK_VERSION_REQUIRED" @@ -100,33 +76,16 @@ else echo "WARNING:java is not installed, skip format java files!" fi -if [[ $(flake8 --version) != *"flake8_quotes"* ]]; then - echo "WARNING: Fory uses flake8 with flake8_quotes. Might error without it. Install with: pip install flake8-quotes" -fi - -if [[ $(flake8 --version) != *"flake8-bugbear"* ]]; then - echo "WARNING: Fory uses flake8 with flake8-bugbear. Might error without it. Install with: pip install flake8-bugbear" -fi - SHELLCHECK_FLAGS=( --exclude=1090 # "Can't follow non-constant source. Use a directive to specify location." --exclude=1091 # "Not following {file} due to some error" --exclude=2207 # "Prefer mapfile or read -a to split command output (or quote to avoid splitting)." -- these aren't compatible with macOS's old Bash ) -BLACK_EXCLUDES=( - '--extend-exclude' 'python/build/*|examples/*' -) - GIT_LS_EXCLUDES=( ':(exclude)src/thirdparty/' ) -# TODO(barakmich): This should be cleaned up. I've at least excised the copies -# of these arguments to this location, but the long-term answer is to actually -# make a flake8 config file -FLAKE8_PYX_IGNORES="--ignore=C408,E121,E123,E126,E211,E225,E226,E227,E24,E402,E704,E999,W503,W504,W605" - # Format specified files format_files() { local shell_files=() python_files=() bazel_files=() @@ -159,24 +118,19 @@ format_files() { done if [ 0 -lt "${#python_files[@]}" ]; then - black "${python_files[@]}" + ruff format "${python_files[@]}" + ruff check --fix "${python_files[@]}" fi } format_all_scripts() { - command -v flake8 &> /dev/null; - HAS_FLAKE8=$? - - echo "$(date)" "Black...." - git ls-files -- '*.py' "${GIT_LS_EXCLUDES[@]}" | xargs -P 10 \ - black "${BLACK_EXCLUDES[@]}" - if [ $HAS_FLAKE8 ]; then - echo "$(date)" "Flake8...." - git ls-files -- '*.py' "${GIT_LS_EXCLUDES[@]}" | xargs \ - flake8 --config=.flake8 - git ls-files -- '*.pyx' '*.pxd' '*.pxi' "${GIT_LS_EXCLUDES[@]}" | xargs \ - flake8 --config=.flake8 "$FLAKE8_PYX_IGNORES" - fi + echo "$(date)" "Ruff format...." + git ls-files -- '*.py' '*.pyx' '*.pxd' '*.pxi' "${GIT_LS_EXCLUDES[@]}" | xargs -P 10 \ + ruff format + + echo "$(date)" "Ruff check...." + git ls-files -- '*.py' '*.pyx' '*.pxd' '*.pxi' "${GIT_LS_EXCLUDES[@]}" | xargs \ + ruff check --fix } format_java() { @@ -233,26 +187,17 @@ format_all() { format_changed() { # The `if` guard ensures that the list of filenames is not empty, which # could cause the formatter to receive 0 positional arguments, making - # Black error. + # it error. # # `diff-filter=ACRM` and $MERGEBASE is to ensure we only format files that # exist on both branches. MERGEBASE="$(git merge-base origin/main HEAD)" - if ! git diff --diff-filter=ACRM --quiet --exit-code "$MERGEBASE" -- '*.py' &>/dev/null; then - git diff --name-only --diff-filter=ACRM "$MERGEBASE" -- '*.py' | xargs -P 5 \ - black "${BLACK_EXCLUDES[@]}" - if which flake8 >/dev/null; then - git diff --name-only --diff-filter=ACRM "$MERGEBASE" -- '*.py' | xargs -P 5 \ - flake8 --config=.flake8 - fi - fi - - if ! git diff --diff-filter=ACRM --quiet --exit-code "$MERGEBASE" -- '*.pyx' '*.pxd' '*.pxi' &>/dev/null; then - if which flake8 >/dev/null; then - git diff --name-only --diff-filter=ACRM "$MERGEBASE" -- '*.pyx' '*.pxd' '*.pxi' | xargs -P 5 \ - flake8 --config=.flake8 "$FLAKE8_PYX_IGNORES" - fi + if ! git diff --diff-filter=ACRM --quiet --exit-code "$MERGEBASE" -- '*.py' '*.pyx' '*.pxd' '*.pxi' &>/dev/null; then + git diff --name-only --diff-filter=ACRM "$MERGEBASE" -- '*.py' '*.pyx' '*.pxd' '*.pxi' | xargs -P 5 \ + ruff format + git diff --name-only --diff-filter=ACRM "$MERGEBASE" -- '*.py' '*.pyx' '*.pxd' '*.pxi' | xargs -P 5 \ + ruff check --fix fi if which clang-format >/dev/null; then diff --git a/ci/run_ci.sh b/ci/run_ci.sh index ee296002..bfe9481f 100755 --- a/ci/run_ci.sh +++ b/ci/run_ci.sh @@ -333,7 +333,7 @@ case $1 in ;; format) echo "Install format tools" - pip install black==22.1.0 flake8==3.9.1 flake8-quotes flake8-bugbear click==8.0.2 + pip install ruff echo "Executing format check" bash ci/format.sh cd "$ROOT/java" diff --git a/python/README.md b/python/README.md index a368e3c7..57d756ff 100644 --- a/python/README.md +++ b/python/README.md @@ -31,12 +31,8 @@ pytest -v -s . ```bash cd python -# install dependencies fro styling -pip install black==22.1.0 flake8==3.9.1 flake8-quotes flake8-bugbear click==8.0.2 -# flake8 pyfory: prompts for code to be formatted, but not formatted -flake8 pyfory -# black pyfory: format python code -black pyfory +pip install ruff +ruff format python ``` ## Debug diff --git a/python/pyfory/format/infer.py b/python/pyfory/format/infer.py index 551db3cc..18410345 100644 --- a/python/pyfory/format/infer.py +++ b/python/pyfory/format/infer.py @@ -42,9 +42,7 @@ def get_cls_by_schema(schema): else: from pyfory.type import record_class_factory - cls_ = record_class_factory( - "Record" + str(id(schema)), [f.name for f in schema] - ) + cls_ = record_class_factory("Record" + str(id(schema)), [f.name for f in schema]) __type_map__[id_] = cls_ __schemas__[id_] = schema return __type_map__[id_] @@ -72,9 +70,7 @@ _supported_types = { typing.List, typing.Dict, } -_supported_types_str = [ - f"{t.__module__}.{getattr(t, '__name__', t)}" for t in _supported_types -] +_supported_types_str = [f"{t.__module__}.{getattr(t, '__name__', t)}" for t in _supported_types] _supported_types_mapping = {t: t for t in _supported_types} _supported_types_mapping.update( { @@ -134,9 +130,7 @@ class ArrowTypeVisitor(TypeVisitor): # typing.List/typing.Dict's origin will be list/dict if type_ not in _supported_types_mapping: raise TypeError( - f"Type {type_} not supported, currently only " - f"compositions of {_supported_types_str} are supported. " - f"types_path is {types_path}" + f"Type {type_} not supported, currently only compositions of {_supported_types_str} are supported. types_path is {types_path}" ) arrow_type_func = _supported_types_mapping.get(type_) return pa.field(field_name, arrow_type_func()) @@ -181,9 +175,7 @@ def _compute_hash(hash_: int, type_: pa.DataType): elif isinstance(type_, pa.StructType): types.extend([f.type for f in type_]) else: - assert ( - type_.num_fields == 0 - ), f"field type should not be nested, but got type {type_}." + assert type_.num_fields == 0, f"field type should not be nested, but got type {type_}." for t in types: hash_ = _compute_hash(hash_, t) diff --git a/python/pyfory/format/tests/test_infer.py b/python/pyfory/format/tests/test_infer.py index efcb1d89..d8951807 100644 --- a/python/pyfory/format/tests/test_infer.py +++ b/python/pyfory/format/tests/test_infer.py @@ -59,12 +59,8 @@ def test_infer_field(): assert _infer_field("", pa.float64).type == pa.float64() assert _infer_field("", str).type == pa.utf8() assert _infer_field("", bytes).type == pa.binary() - assert _infer_field("", List[Dict[str, str]]).type == pa.list_( - pa.map_(pa.utf8(), pa.utf8()) - ) - assert _infer_field( - "", List[Dict[str, Dict[str, List[pa.int32]]]] - ).type == pa.list_(pa.map_(pa.utf8(), pa.map_(pa.utf8(), pa.list_(pa.int32())))) + assert _infer_field("", List[Dict[str, str]]).type == pa.list_(pa.map_(pa.utf8(), pa.utf8())) + assert _infer_field("", List[Dict[str, Dict[str, List[pa.int32]]]]).type == pa.list_(pa.map_(pa.utf8(), pa.map_(pa.utf8(), pa.list_(pa.int32())))) with pytest.raises(TypeError): _infer_field("", pa.int8()) _infer_field("", pa.utf8()) @@ -77,9 +73,7 @@ def test_infer_field(): def test_infer_class_schema(): schema = infer_schema(Foo) - assert schema == foo_schema(), ( - f"schema {schema}\n====\n," f"foo_schema {foo_schema()}" - ) + assert schema == foo_schema(), f"schema {schema}\n====\n,foo_schema {foo_schema()}" def test_type_id(): diff --git a/python/pyfory/format/tests/test_vectorized.py b/python/pyfory/format/tests/test_vectorized.py index 239e65bb..560fe5ce 100644 --- a/python/pyfory/format/tests/test_vectorized.py +++ b/python/pyfory/format/tests/test_vectorized.py @@ -42,9 +42,7 @@ def test_vectorized(): num_rows = 10 data = [[] for _ in range(len(field_names))] for i in range(num_rows): - obj = cls( - f1=2**63 - 1, f2=2**31 - 1, f3=2**15 - 1, f4=2**7 - 1, f5=f"str{i}" - ) + obj = cls(f1=2**63 - 1, f2=2**31 - 1, f3=2**15 - 1, f4=2**7 - 1, f5=f"str{i}") fields_data = list(obj) for j in range(len(fields_data)): data[j].append(fields_data[j]) diff --git a/python/pyfory/meta/metastring.py b/python/pyfory/meta/metastring.py index c1f73585..ff739a49 100644 --- a/python/pyfory/meta/metastring.py +++ b/python/pyfory/meta/metastring.py @@ -138,16 +138,11 @@ class MetaStringDecoder: strip_last_char = (data[0] & 0x80) != 0 # Check the first bit of the first byte bit_index = 1 bit_mask = 0b11111 - while bit_index + 5 <= num_bits and not ( - strip_last_char and (bit_index + 2 * 5 > num_bits) - ): + while bit_index + 5 <= num_bits and not (strip_last_char and (bit_index + 2 * 5 > num_bits)): byte_index = bit_index // 8 intra_byte_index = bit_index % 8 # Extract the 5-bit character value across byte boundaries if needed - char_value = ( - (data[byte_index] << 8) - | (data[byte_index + 1] if byte_index + 1 < len(data) else 0) - ) >> (11 - intra_byte_index) & bit_mask + char_value = ((data[byte_index] << 8) | (data[byte_index + 1] if byte_index + 1 < len(data) else 0)) >> (11 - intra_byte_index) & bit_mask bit_index += 5 decoded.append(self._decode_lower_special_char(char_value)) @@ -168,16 +163,11 @@ class MetaStringDecoder: strip_last_char = (data[0] & 0x80) != 0 bit_mask = 0b111111 num_bits = len(data) * 8 - while bit_index + 6 <= num_bits and not ( - strip_last_char and (bit_index + 2 * 6 > num_bits) - ): + while bit_index + 6 <= num_bits and not (strip_last_char and (bit_index + 2 * 6 > num_bits)): byte_index = bit_index // 8 intra_byte_index = bit_index % 8 # Extract the 6-bit character value across byte boundaries if needed - char_value = ( - (data[byte_index] << 8) - | (data[byte_index + 1] if byte_index + 1 < len(data) else 0) - ) >> (10 - intra_byte_index) & bit_mask + char_value = ((data[byte_index] << 8) | (data[byte_index + 1] if byte_index + 1 < len(data) else 0)) >> (10 - intra_byte_index) & bit_mask bit_index += 6 decoded.append(self._decode_lower_upper_digit_special_char(char_value)) return "".join(decoded) @@ -226,9 +216,7 @@ class MetaStringDecoder: elif char_value == 63: return self.special_char2 # Use special_char2 for the encoding else: - raise ValueError( - f"Invalid character value for LOWER_UPPER_DIGIT_SPECIAL: {char_value}" - ) + raise ValueError(f"Invalid character value for LOWER_UPPER_DIGIT_SPECIAL: {char_value}") def _decode_rep_first_lower_special(self, data: bytes) -> str: """ @@ -291,9 +279,7 @@ class MetaStringEncoder: MetaString: The encoded MetaString object. """ # Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed. - assert ( - len(input_string) < _METASTRING_NUM_CHARS_LIMIT - ), "Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed." + assert len(input_string) < _METASTRING_NUM_CHARS_LIMIT, "Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed." if not input_string: return MetaString( @@ -320,9 +306,7 @@ class MetaStringEncoder: MetaString: An encoded MetaString object. """ # Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed. - assert ( - len(input_string) < _METASTRING_NUM_CHARS_LIMIT - ), "Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed." + assert len(input_string) < _METASTRING_NUM_CHARS_LIMIT, "Long meta string than _METASTRING_NUM_CHARS_LIMIT is not allowed." if not input_string: return MetaString( @@ -434,12 +418,7 @@ class MetaStringEncoder: upper_count = 0 for c in chars: if can_lower_upper_digit_special_encoded: - if not ( - c.islower() - or c.isupper() - or c.isdigit() - or c in {self.special_char1, self.special_char2} - ): + if not (c.islower() or c.isupper() or c.isdigit() or c in {self.special_char1, self.special_char2}): can_lower_upper_digit_special_encoded = False if can_lower_special_encoded: if not (c.islower() or c in {".", "_", "$", "|"}): @@ -566,9 +545,7 @@ class MetaStringEncoder: elif c == "|": return 29 else: - raise ValueError( - f"Unsupported character for LOWER_SPECIAL encoding: {c}" - ) + raise ValueError(f"Unsupported character for LOWER_SPECIAL encoding: {c}") elif bits_per_char == 6: if "a" <= c <= "z": return ord(c) - ord("a") @@ -581,6 +558,4 @@ class MetaStringEncoder: elif c == self.special_char2: return 63 else: - raise ValueError( - f"Unsupported character for LOWER_UPPER_DIGIT_SPECIAL encoding: {c}" - ) + raise ValueError(f"Unsupported character for LOWER_UPPER_DIGIT_SPECIAL encoding: {c}") diff --git a/python/pyproject.toml b/python/pyproject.toml index c7aa3f07..e659925c 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -56,6 +56,7 @@ dependencies = [ [project.optional-dependencies] format = ["pyarrow"] all = ["pyarrow"] +dev = ["ruff"] [tool.setuptools] packages = ["pyfory", "pyfory.format", "pyfory.lib", "pyfory.meta"] @@ -69,3 +70,30 @@ zip-safe = false [tool.setuptools.dynamic] version = {attr = "pyfory.__version__"} + +[tool.ruff] +# Replaces settings from .flake8 and format.sh +line-length = 150 +exclude = [ + "python/build/", + "python/.eggs/", + "examples/*", +] + +[tool.ruff.lint] +# Replicates the `ignore` list from the .flake8 file. +ignore = [ + "C408", "E203", "E225", "E226", "E227", "E402", "E24", + "W605", "I", "N", "B002", "B003", "B004", "B005", + "B007", "B008", "B009", "B010", "B011", "B012", "B013", "B014", "B015", "B016", "B017", +] + +[tool.ruff.lint.per-file-ignores] +# Replicates the FLAKE8_PYX_IGNORES for Cython files from format.sh. +"*.pyx" = ["C408", "E211", "E225", "E226", "E227", "E24", "E402", "W605"] +"*.pxd" = ["C408", "E211", "E225", "E226", "E227", "E24", "E402", "W605"] +"*.pxi" = ["C408", "E211", "E225", "E226", "E227", "E24", "E402", "W605"] + +[tool.ruff.format] +# Replicates the `inline-quotes` setting from .flake8 +quote-style = "double" --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@fory.apache.org For additional commands, e-mail: commits-h...@fory.apache.org