martin-g commented on code in PR #20474: URL: https://github.com/apache/datafusion/pull/20474#discussion_r2840661103
########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false Review Comment: What if the setting's default is `true` and an .slt test needed to set it to `false` but forgot to unset it ? It would be better to use the `reset datafusion.***;` command instead ########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false +# +# This prevents configuration state from leaking into subsequent tests, +# which can cause nondeterministic or hard-to-debug failures. +# +# Usage: +# Check all .slt files under datafusion/sqllogictest/test_files: +# ./check_slt_configs.sh +# +# Check a specific file (repo-relative or absolute path): +# ./check_slt_configs.sh datafusion/sqllogictest/test_files/foo.slt +# ./check_slt_configs.sh /absolute/path/to/foo.slt +# +# Exit codes: +# 0 All checks passed +# 1 One or more files contain configs set to true but never reset + +set -euo pipefail + +# Resolve repository root +ROOT_DIR="$(git rev-parse --show-toplevel)" +TARGET_DIR="${ROOT_DIR}/datafusion/sqllogictest/test_files" + +FAILED=0 + +resolve_file() { + local input="$1" + + # If absolute and exists + if [[ -f "$input" ]]; then + realpath "$input" + return + fi + + # If relative to repo root + if [[ -f "${ROOT_DIR}/${input}" ]]; then + realpath "${ROOT_DIR}/${input}" + return + fi + + echo "" +} + +# If file argument is passed, check only that file +if [[ $# -eq 1 ]]; then + RESOLVED_FILE="$(resolve_file "$1")" + + if [[ -z "$RESOLVED_FILE" ]]; then + echo "❌ File does not exist: $1" + exit 1 + fi + + FILES="$RESOLVED_FILE" +else + FILES=$(find "$TARGET_DIR" -type f -name "*.slt") +fi + +# Iterate safely even if filenames contain spaces +while IFS= read -r file; do + echo "Checking file: $file" + + if [[ ! -f "$file" ]]; then + echo " ❌ File not found: $file" + FAILED=1 + continue + fi + + matches=$(grep -En \ + 'set[[:space:]]+datafusion\.[a-zA-Z0-9_.]+[[:space:]]*=[[:space:]]*true' \ + "$file" || true) + + [[ -z "$matches" ]] && continue + + # Process each match line-by-line + while IFS= read -r match; do + line_number=$(echo "$match" | cut -d: -f1) + line_content=$(echo "$match" | cut -d: -f2-) Review Comment: nit: You could avoid starting a new process by using Bash-isms like: ``` line_number=${match%%:*} line_content=${match#*:} ``` ########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false +# +# This prevents configuration state from leaking into subsequent tests, +# which can cause nondeterministic or hard-to-debug failures. +# +# Usage: +# Check all .slt files under datafusion/sqllogictest/test_files: +# ./check_slt_configs.sh +# +# Check a specific file (repo-relative or absolute path): +# ./check_slt_configs.sh datafusion/sqllogictest/test_files/foo.slt +# ./check_slt_configs.sh /absolute/path/to/foo.slt +# +# Exit codes: +# 0 All checks passed +# 1 One or more files contain configs set to true but never reset + +set -euo pipefail + +# Resolve repository root +ROOT_DIR="$(git rev-parse --show-toplevel)" +TARGET_DIR="${ROOT_DIR}/datafusion/sqllogictest/test_files" + +FAILED=0 + +resolve_file() { + local input="$1" + + # If absolute and exists + if [[ -f "$input" ]]; then + realpath "$input" + return + fi + + # If relative to repo root + if [[ -f "${ROOT_DIR}/${input}" ]]; then + realpath "${ROOT_DIR}/${input}" + return + fi + + echo "" Review Comment: This looks strange. Why not log the error from line 74 here and exit ? ########## .github/workflows/rust.yml: ########## @@ -742,7 +742,33 @@ jobs: - name: Run examples docs check script run: | bash ci/scripts/check_examples_docs.sh + + # This job ensures that sqllogictest (.slt) files do not leave + # DataFusion configuration options enabled (e.g. `set ... = true`) + # without resetting them back to their default state. + # + # It runs ci/scripts/check_slt_configs.sh and fails if any + # dangling configuration settings are detected. + sqllogictest-config-check: + name: check sqllogictest configs are properly reset + needs: linux-build-lib + runs-on: ubuntu-latest + container: + image: amd64/rust Review Comment: Any reason to use a container ? The CI job just runs a Bash shell script ########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false +# +# This prevents configuration state from leaking into subsequent tests, +# which can cause nondeterministic or hard-to-debug failures. +# +# Usage: +# Check all .slt files under datafusion/sqllogictest/test_files: +# ./check_slt_configs.sh +# +# Check a specific file (repo-relative or absolute path): +# ./check_slt_configs.sh datafusion/sqllogictest/test_files/foo.slt +# ./check_slt_configs.sh /absolute/path/to/foo.slt +# +# Exit codes: +# 0 All checks passed +# 1 One or more files contain configs set to true but never reset + +set -euo pipefail + +# Resolve repository root +ROOT_DIR="$(git rev-parse --show-toplevel)" +TARGET_DIR="${ROOT_DIR}/datafusion/sqllogictest/test_files" + +FAILED=0 + +resolve_file() { + local input="$1" + + # If absolute and exists + if [[ -f "$input" ]]; then + realpath "$input" + return + fi + + # If relative to repo root + if [[ -f "${ROOT_DIR}/${input}" ]]; then + realpath "${ROOT_DIR}/${input}" + return + fi + + echo "" +} + +# If file argument is passed, check only that file +if [[ $# -eq 1 ]]; then + RESOLVED_FILE="$(resolve_file "$1")" + + if [[ -z "$RESOLVED_FILE" ]]; then + echo "❌ File does not exist: $1" + exit 1 + fi + + FILES="$RESOLVED_FILE" +else + FILES=$(find "$TARGET_DIR" -type f -name "*.slt") +fi + +# Iterate safely even if filenames contain spaces +while IFS= read -r file; do + echo "Checking file: $file" + + if [[ ! -f "$file" ]]; then + echo " ❌ File not found: $file" + FAILED=1 + continue + fi + + matches=$(grep -En \ + 'set[[:space:]]+datafusion\.[a-zA-Z0-9_.]+[[:space:]]*=[[:space:]]*true' \ + "$file" || true) + + [[ -z "$matches" ]] && continue + + # Process each match line-by-line + while IFS= read -r match; do + line_number=$(echo "$match" | cut -d: -f1) + line_content=$(echo "$match" | cut -d: -f2-) + + # Extract config name + config=$(echo "$line_content" \ + | sed -E 's/set[[:space:]]+(datafusion\.[a-zA-Z0-9_.]+).*/\1/') + + # Check if reset exists anywhere in file + if ! grep -Eq \ + "set[[:space:]]+$config[[:space:]]*=[[:space:]]*false" \ Review Comment: ```suggestion "set[[:space:]]+${config}[[:space:]]*=[[:space:]]*false" \ ``` ########## datafusion/sqllogictest/test_files/push_down_filter.slt: ########## @@ -743,3 +744,18 @@ drop table t1; statement ok drop table t2; + +statement ok +set datafusion.execution.parquet.pushdown_filters = false; + +statement ok +set datafusion.execution.collect_statistics = false; + +statement ok +set datafusion.execution.parquet.pushdown_filters = false; Review Comment: duplicate of line 749 ########## datafusion/sqllogictest/test_files/information_schema_multiple_catalogs.slt: ########## Review Comment: The catalog and the schema below are never reset. This is an example of non-boolean settings. ########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false +# +# This prevents configuration state from leaking into subsequent tests, +# which can cause nondeterministic or hard-to-debug failures. +# +# Usage: +# Check all .slt files under datafusion/sqllogictest/test_files: +# ./check_slt_configs.sh +# +# Check a specific file (repo-relative or absolute path): +# ./check_slt_configs.sh datafusion/sqllogictest/test_files/foo.slt +# ./check_slt_configs.sh /absolute/path/to/foo.slt +# +# Exit codes: +# 0 All checks passed +# 1 One or more files contain configs set to true but never reset + +set -euo pipefail + +# Resolve repository root +ROOT_DIR="$(git rev-parse --show-toplevel)" +TARGET_DIR="${ROOT_DIR}/datafusion/sqllogictest/test_files" + +FAILED=0 + +resolve_file() { + local input="$1" + + # If absolute and exists + if [[ -f "$input" ]]; then + realpath "$input" + return + fi + + # If relative to repo root + if [[ -f "${ROOT_DIR}/${input}" ]]; then + realpath "${ROOT_DIR}/${input}" + return + fi + + echo "" +} + +# If file argument is passed, check only that file +if [[ $# -eq 1 ]]; then + RESOLVED_FILE="$(resolve_file "$1")" + + if [[ -z "$RESOLVED_FILE" ]]; then + echo "❌ File does not exist: $1" + exit 1 + fi + + FILES="$RESOLVED_FILE" +else + FILES=$(find "$TARGET_DIR" -type f -name "*.slt") +fi + +# Iterate safely even if filenames contain spaces +while IFS= read -r file; do + echo "Checking file: $file" + + if [[ ! -f "$file" ]]; then + echo " ❌ File not found: $file" + FAILED=1 + continue + fi + + matches=$(grep -En \ + 'set[[:space:]]+datafusion\.[a-zA-Z0-9_.]+[[:space:]]*=[[:space:]]*true' \ + "$file" || true) + + [[ -z "$matches" ]] && continue + + # Process each match line-by-line + while IFS= read -r match; do + line_number=$(echo "$match" | cut -d: -f1) + line_content=$(echo "$match" | cut -d: -f2-) + + # Extract config name + config=$(echo "$line_content" \ + | sed -E 's/set[[:space:]]+(datafusion\.[a-zA-Z0-9_.]+).*/\1/') + + # Check if reset exists anywhere in file + if ! grep -Eq \ + "set[[:space:]]+$config[[:space:]]*=[[:space:]]*false" \ Review Comment: This should also support the `reset datafusion.***` command ########## .github/workflows/rust.yml: ########## @@ -742,7 +742,33 @@ jobs: - name: Run examples docs check script run: | bash ci/scripts/check_examples_docs.sh + + # This job ensures that sqllogictest (.slt) files do not leave + # DataFusion configuration options enabled (e.g. `set ... = true`) + # without resetting them back to their default state. + # + # It runs ci/scripts/check_slt_configs.sh and fails if any + # dangling configuration settings are detected. + sqllogictest-config-check: + name: check sqllogictest configs are properly reset + needs: linux-build-lib Review Comment: Is there any reason to wait for the Rust build ? IMO the checks could be done earlier. ########## datafusion/sqllogictest/test_files/set_variable.slt: ########## @@ -447,3 +447,6 @@ datafusion.runtime.max_temp_directory_size datafusion.runtime.memory_limit datafusion.runtime.metadata_cache_limit datafusion.runtime.temp_directory + +statement ok +set datafusion.catalog.information_schema = false Review Comment: ```suggestion set datafusion.catalog.information_schema = false; ``` ########## ci/scripts/check_slt_configs.sh: ########## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Checks sqllogictest (.slt) files for dangling DataFusion configuration +# settings. +# +# Any configuration set with: +# set datafusion.<config> = true +# must be reset in the same file with: +# set datafusion.<config> = false +# +# This prevents configuration state from leaking into subsequent tests, +# which can cause nondeterministic or hard-to-debug failures. +# +# Usage: +# Check all .slt files under datafusion/sqllogictest/test_files: +# ./check_slt_configs.sh +# +# Check a specific file (repo-relative or absolute path): +# ./check_slt_configs.sh datafusion/sqllogictest/test_files/foo.slt +# ./check_slt_configs.sh /absolute/path/to/foo.slt +# +# Exit codes: +# 0 All checks passed +# 1 One or more files contain configs set to true but never reset + +set -euo pipefail + +# Resolve repository root +ROOT_DIR="$(git rev-parse --show-toplevel)" +TARGET_DIR="${ROOT_DIR}/datafusion/sqllogictest/test_files" + +FAILED=0 + +resolve_file() { + local input="$1" + + # If absolute and exists + if [[ -f "$input" ]]; then + realpath "$input" + return + fi + + # If relative to repo root + if [[ -f "${ROOT_DIR}/${input}" ]]; then + realpath "${ROOT_DIR}/${input}" + return + fi + + echo "" +} + +# If file argument is passed, check only that file +if [[ $# -eq 1 ]]; then + RESOLVED_FILE="$(resolve_file "$1")" + + if [[ -z "$RESOLVED_FILE" ]]; then + echo "❌ File does not exist: $1" + exit 1 + fi + + FILES="$RESOLVED_FILE" +else + FILES=$(find "$TARGET_DIR" -type f -name "*.slt") +fi + +# Iterate safely even if filenames contain spaces +while IFS= read -r file; do + echo "Checking file: $file" + + if [[ ! -f "$file" ]]; then + echo " ❌ File not found: $file" + FAILED=1 + continue + fi + + matches=$(grep -En \ + 'set[[:space:]]+datafusion\.[a-zA-Z0-9_.]+[[:space:]]*=[[:space:]]*true' \ Review Comment: `set` could also be `SET` See https://github.com/cj-zhukov/datafusion/blob/fe921dca542e84a48bc02e3c409ce7ac45686283/datafusion/sqllogictest/test_files/set_variable.slt#L23 for such example ########## datafusion/sqllogictest/test_files/type_coercion.slt: ########## @@ -281,3 +281,6 @@ SELECT true FROM t0 WHERE (REGEXP_MATCH(t0.v1, t0.v1)) NOT ILIKE []; statement ok DROP TABLE t0; + +statement ok +set datafusion.explain.logical_plan_only = false; Review Comment: This is already done at https://github.com/apache/datafusion/pull/20474/changes#diff-7f1b4e520d77b90cae272fc9d03f3c397d1ab10e47de434ff9f8058ffd7cc083R249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
