Re: [PR] feat(build): migrate build support scripts [paimon-cpp]

via GitHub Sun, 24 May 2026 23:22:52 -0700


Copilot commented on code in PR #12:
URL: https://github.com/apache/paimon-cpp/pull/12#discussion_r3296410062



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script which wraps running a test and redirects its output to a
+# test log directory.
+#
+# Arguments:
+#    $1 - Base path for logs/artifacts.
+#    $2 - type of test (e.g. test or benchmark)
+#    $3 - path to executable
+#    $ARGN - arguments for executable
+#
+
+OUTPUT_ROOT=$1
+shift
+ROOT=$(cd $(dirname $BASH_SOURCE)/..; pwd)
+
+TEST_LOGDIR=$OUTPUT_ROOT/build/$1-logs
+mkdir -p $TEST_LOGDIR
+
+RUN_TYPE=$1
+shift
+TEST_DEBUGDIR=$OUTPUT_ROOT/build/$RUN_TYPE-debug
+mkdir -p $TEST_DEBUGDIR
+
+TEST_DIRNAME=$(cd $(dirname $1); pwd)
+TEST_FILENAME=$(basename $1)
+shift
+TEST_EXECUTABLE="$TEST_DIRNAME/$TEST_FILENAME"
+TEST_NAME=$(echo $TEST_FILENAME | perl -pe 's/\..+?$//') # Remove path and 
extension (if any).
+
+# We run each test in its own subdir to avoid core file related races.
+TEST_WORKDIR=$OUTPUT_ROOT/build/test-work/$TEST_NAME
+mkdir -p $TEST_WORKDIR
+pushd $TEST_WORKDIR >/dev/null || exit 1
+# copy test data before run test
+TEST_DATA_SRC_DIR=$OUTPUT_ROOT/../test/test_data
+TEST_DATA_DST_DIR=$TEST_WORKDIR/test
+mkdir -p $TEST_DATA_DST_DIR
+cp -r $TEST_DATA_SRC_DIR $TEST_DATA_DST_DIR
+rm -f *
+
+set -o pipefail
+
+LOGFILE=$TEST_LOGDIR/$TEST_NAME.txt
+XMLFILE=$TEST_LOGDIR/$TEST_NAME.xml
+
+TEST_EXECUTION_ATTEMPTS=1
+
+# Remove both the uncompressed output, so the developer doesn't accidentally 
get confused
+# and read output from a prior test run.
+rm -f $LOGFILE $LOGFILE.gz
+
+pipe_cmd=cat
+
+function setup_sanitizers() {
+  # Sets environment variables for different sanitizers (it configures how) 
the run_tests. Function works.

Review Comment:
   This comment appears garbled/auto-generated and is hard to parse. Consider 
replacing with something clear, e.g. "Configures environment variables for the 
various sanitizers used by the test executables."
   



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script which wraps running a test and redirects its output to a
+# test log directory.
+#
+# Arguments:
+#    $1 - Base path for logs/artifacts.
+#    $2 - type of test (e.g. test or benchmark)
+#    $3 - path to executable
+#    $ARGN - arguments for executable
+#
+
+OUTPUT_ROOT=$1
+shift
+ROOT=$(cd $(dirname $BASH_SOURCE)/..; pwd)
+
+TEST_LOGDIR=$OUTPUT_ROOT/build/$1-logs
+mkdir -p $TEST_LOGDIR
+
+RUN_TYPE=$1
+shift
+TEST_DEBUGDIR=$OUTPUT_ROOT/build/$RUN_TYPE-debug
+mkdir -p $TEST_DEBUGDIR
+
+TEST_DIRNAME=$(cd $(dirname $1); pwd)
+TEST_FILENAME=$(basename $1)
+shift
+TEST_EXECUTABLE="$TEST_DIRNAME/$TEST_FILENAME"
+TEST_NAME=$(echo $TEST_FILENAME | perl -pe 's/\..+?$//') # Remove path and 
extension (if any).
+
+# We run each test in its own subdir to avoid core file related races.
+TEST_WORKDIR=$OUTPUT_ROOT/build/test-work/$TEST_NAME
+mkdir -p $TEST_WORKDIR
+pushd $TEST_WORKDIR >/dev/null || exit 1
+# copy test data before run test
+TEST_DATA_SRC_DIR=$OUTPUT_ROOT/../test/test_data
+TEST_DATA_DST_DIR=$TEST_WORKDIR/test
+mkdir -p $TEST_DATA_DST_DIR
+cp -r $TEST_DATA_SRC_DIR $TEST_DATA_DST_DIR
+rm -f *
+
+set -o pipefail
+
+LOGFILE=$TEST_LOGDIR/$TEST_NAME.txt
+XMLFILE=$TEST_LOGDIR/$TEST_NAME.xml
+
+TEST_EXECUTION_ATTEMPTS=1

Review Comment:
   With `TEST_EXECUTION_ATTEMPTS=1`, the condition `[ $ATTEMPT_NUMBER -lt 
$TEST_EXECUTION_ATTEMPTS ]` at lines 192 and 202 is never true, so the 
leftover-test-directory cleanup logic is dead code. Either remove the cleanup 
block or make this value configurable (e.g. via env var) so retry/cleanup 
actually runs.
   



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script which wraps running a test and redirects its output to a
+# test log directory.
+#
+# Arguments:
+#    $1 - Base path for logs/artifacts.
+#    $2 - type of test (e.g. test or benchmark)
+#    $3 - path to executable
+#    $ARGN - arguments for executable
+#
+
+OUTPUT_ROOT=$1
+shift
+ROOT=$(cd $(dirname $BASH_SOURCE)/..; pwd)
+
+TEST_LOGDIR=$OUTPUT_ROOT/build/$1-logs
+mkdir -p $TEST_LOGDIR
+
+RUN_TYPE=$1
+shift
+TEST_DEBUGDIR=$OUTPUT_ROOT/build/$RUN_TYPE-debug
+mkdir -p $TEST_DEBUGDIR
+
+TEST_DIRNAME=$(cd $(dirname $1); pwd)
+TEST_FILENAME=$(basename $1)
+shift
+TEST_EXECUTABLE="$TEST_DIRNAME/$TEST_FILENAME"
+TEST_NAME=$(echo $TEST_FILENAME | perl -pe 's/\..+?$//') # Remove path and 
extension (if any).
+
+# We run each test in its own subdir to avoid core file related races.
+TEST_WORKDIR=$OUTPUT_ROOT/build/test-work/$TEST_NAME
+mkdir -p $TEST_WORKDIR
+pushd $TEST_WORKDIR >/dev/null || exit 1
+# copy test data before run test
+TEST_DATA_SRC_DIR=$OUTPUT_ROOT/../test/test_data
+TEST_DATA_DST_DIR=$TEST_WORKDIR/test
+mkdir -p $TEST_DATA_DST_DIR
+cp -r $TEST_DATA_SRC_DIR $TEST_DATA_DST_DIR
+rm -f *
+
+set -o pipefail
+
+LOGFILE=$TEST_LOGDIR/$TEST_NAME.txt
+XMLFILE=$TEST_LOGDIR/$TEST_NAME.xml
+
+TEST_EXECUTION_ATTEMPTS=1
+
+# Remove both the uncompressed output, so the developer doesn't accidentally 
get confused
+# and read output from a prior test run.
+rm -f $LOGFILE $LOGFILE.gz
+
+pipe_cmd=cat
+
+function setup_sanitizers() {
+  # Sets environment variables for different sanitizers (it configures how) 
the run_tests. Function works.
+
+  # Configure TSAN (ignored if this isn't a TSAN build).
+  #
+  # Deadlock detection (new in clang 3.5) is disabled because:
+  # 1. The clang 3.5 deadlock detector crashes in some unit tests. It
+  #    needs compiler-rt commits c4c3dfd, 9a8efe3, and possibly others.
+  # 2. Many unit tests report lock-order-inversion warnings; they should be
+  #    fixed before reenabling the detector.
+  TSAN_OPTIONS="$TSAN_OPTIONS detect_deadlocks=0"
+  TSAN_OPTIONS="$TSAN_OPTIONS 
suppressions=$ROOT/build_support/tsan-suppressions.txt"
+  TSAN_OPTIONS="$TSAN_OPTIONS history_size=7"
+  export TSAN_OPTIONS
+
+  UBSAN_OPTIONS="$UBSAN_OPTIONS print_stacktrace=1"
+  UBSAN_OPTIONS="$UBSAN_OPTIONS 
suppressions=$ROOT/build_support/ubsan-suppressions.txt"
+  export UBSAN_OPTIONS
+
+  # Enable leak detection even under LLVM 3.4, where it was disabled by 
default.
+  # This flag only takes effect when running an ASAN build.
+  # ASAN_OPTIONS="$ASAN_OPTIONS detect_leaks=1"
+  # export ASAN_OPTIONS
+
+  # Set up suppressions for LeakSanitizer
+  LSAN_OPTIONS="$LSAN_OPTIONS 
suppressions=$ROOT/build_support/lsan-suppressions.txt"
+  export LSAN_OPTIONS
+}
+
+function run_test() {
+  # Run gtest style tests with sanitizers if they are setup appropriately.
+
+  # gtest won't overwrite old junit test files, resulting in a build failure
+  # even when retries are successful.
+  rm -f $XMLFILE
+
+  $TEST_EXECUTABLE "$@" 2>&1 \
+    | ${PYTHON:-python3} $ROOT/build_support/asan_symbolize.py \
+    | ${CXXFILT:-c++filt} \
+    | $ROOT/build_support/stacktrace_addr2line.pl $TEST_EXECUTABLE \
+    | $pipe_cmd 2>&1 | tee $LOGFILE
+  STATUS=$?
+
+  # TSAN doesn't always exit with a non-zero exit code due to a bug:
+  # mutex errors don't get reported through the normal error reporting 
infrastructure.
+  # So we make sure to detect this and exit 1.
+  #
+  # Additionally, certain types of failures won't show up in the standard JUnit
+  # XML output from gtest. We assume that gtest knows better than us and our
+  # regexes in most cases, but for certain errors we delete the resulting xml
+  # file and let our own post-processing step regenerate it.
+  export GREP=$(which egrep)
+  if zgrep --silent "ThreadSanitizer|Leak check.*detected leaks" $LOGFILE ; 
then
+    echo ThreadSanitizer or leak check failures in $LOGFILE
+    STATUS=1
+    rm -f $XMLFILE
+  fi
+}
+
+function print_coredumps() {
+  # The script expects core files relative to the build directory with unique
+  # names per test executable because of the parallel running. So the corefile
+  # patterns must be set with prefix `core.{test-executable}*`:
+  #
+  # In case of macOS:
+  #   sudo sysctl -w kern.corefile=core.%N.%P
+  # On Linux:
+  #   sudo sysctl -w kernel.core_pattern=core.%e.%p
+  #
+  # and the ulimit must be increased:
+  #   ulimit -c unlimited
+
+  # filename is truncated to the first 15 characters in case of linux, so limit
+  # the pattern for the first 15 characters
+  FILENAME=$(basename "${TEST_EXECUTABLE}")
+  FILENAME=$(echo ${FILENAME} | cut -c-15)
+  PATTERN="^core\.${FILENAME}"
+
+  COREFILES=$(ls | grep $PATTERN)
+  if [ -n "$COREFILES" ]; then
+    echo "Found core dump, printing backtrace:"
+
+    for COREFILE in $COREFILES; do
+      # Print backtrace
+      if [ "$(uname)" == "Darwin" ]; then
+        lldb -c "${COREFILE}" --batch --one-line "thread backtrace all -e true"
+      else
+        gdb -c "${COREFILE}" $TEST_EXECUTABLE -ex "thread apply all bt" -ex 
"set pagination 0" -batch
+      fi
+      # Remove the coredump, regenerate it via running the test case directly
+      rm "${COREFILE}"
+    done
+  fi
+}
+
+function post_process_tests() {
+  # If we have a LeakSanitizer report, and XML reporting is configured, add a 
new test
+  # case result to the XML file for the leak report. Otherwise Jenkins won't 
show
+  # us which tests had LSAN errors.
+  if zgrep --silent "ERROR: LeakSanitizer: detected memory leaks" $LOGFILE ; 
then
+      echo Test had memory leaks. Editing XML
+      perl -p -i -e '
+      if (m#</testsuite>#) {
+        print "<testcase name=\"LeakSanitizer\" status=\"run\" 
classname=\"LSAN\">\n";
+        print "  <failure message=\"LeakSanitizer failed\" type=\"\">\n";
+        print "    See txt log file for details\n";
+        print "  </failure>\n";
+        print "</testcase>\n";
+      }' $XMLFILE
+  fi
+}
+
+function run_other() {
+  # Generic run function for test like executables that aren't actually gtest
+  $TEST_EXECUTABLE "$@" 2>&1 | $pipe_cmd > $LOGFILE
+  STATUS=$?
+}
+
+if [ $RUN_TYPE = "test" ]; then
+  setup_sanitizers
+fi
+
+# Run the actual test.
+for ATTEMPT_NUMBER in $(seq 1 $TEST_EXECUTION_ATTEMPTS) ; do
+  if [ $ATTEMPT_NUMBER -lt $TEST_EXECUTION_ATTEMPTS ]; then
+    # If the test fails, the test output may or may not be left behind,

Review Comment:
   `TEST_TMPDIR_BEFORE` and `TEST_TMPDIR_AFTER` are both captured before 
`run_test`/`run_other` is invoked (the test runs at lines 222–226). Thus `DIFF` 
will always be empty and the cleanup never deletes anything. The "AFTER" 
snapshot and cleanup loop must execute after the test invocation (e.g. at the 
top of the next iteration, after the retry message, or as a post-step).



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script which wraps running a test and redirects its output to a
+# test log directory.
+#
+# Arguments:
+#    $1 - Base path for logs/artifacts.
+#    $2 - type of test (e.g. test or benchmark)
+#    $3 - path to executable
+#    $ARGN - arguments for executable
+#
+
+OUTPUT_ROOT=$1
+shift
+ROOT=$(cd $(dirname $BASH_SOURCE)/..; pwd)
+
+TEST_LOGDIR=$OUTPUT_ROOT/build/$1-logs
+mkdir -p $TEST_LOGDIR
+
+RUN_TYPE=$1
+shift
+TEST_DEBUGDIR=$OUTPUT_ROOT/build/$RUN_TYPE-debug
+mkdir -p $TEST_DEBUGDIR
+
+TEST_DIRNAME=$(cd $(dirname $1); pwd)
+TEST_FILENAME=$(basename $1)
+shift
+TEST_EXECUTABLE="$TEST_DIRNAME/$TEST_FILENAME"
+TEST_NAME=$(echo $TEST_FILENAME | perl -pe 's/\..+?$//') # Remove path and 
extension (if any).
+
+# We run each test in its own subdir to avoid core file related races.
+TEST_WORKDIR=$OUTPUT_ROOT/build/test-work/$TEST_NAME
+mkdir -p $TEST_WORKDIR
+pushd $TEST_WORKDIR >/dev/null || exit 1
+# copy test data before run test
+TEST_DATA_SRC_DIR=$OUTPUT_ROOT/../test/test_data
+TEST_DATA_DST_DIR=$TEST_WORKDIR/test
+mkdir -p $TEST_DATA_DST_DIR
+cp -r $TEST_DATA_SRC_DIR $TEST_DATA_DST_DIR
+rm -f *
+
+set -o pipefail
+
+LOGFILE=$TEST_LOGDIR/$TEST_NAME.txt
+XMLFILE=$TEST_LOGDIR/$TEST_NAME.xml
+
+TEST_EXECUTION_ATTEMPTS=1
+
+# Remove both the uncompressed output, so the developer doesn't accidentally 
get confused
+# and read output from a prior test run.
+rm -f $LOGFILE $LOGFILE.gz
+
+pipe_cmd=cat
+
+function setup_sanitizers() {
+  # Sets environment variables for different sanitizers (it configures how) 
the run_tests. Function works.
+
+  # Configure TSAN (ignored if this isn't a TSAN build).
+  #
+  # Deadlock detection (new in clang 3.5) is disabled because:
+  # 1. The clang 3.5 deadlock detector crashes in some unit tests. It
+  #    needs compiler-rt commits c4c3dfd, 9a8efe3, and possibly others.
+  # 2. Many unit tests report lock-order-inversion warnings; they should be
+  #    fixed before reenabling the detector.
+  TSAN_OPTIONS="$TSAN_OPTIONS detect_deadlocks=0"
+  TSAN_OPTIONS="$TSAN_OPTIONS 
suppressions=$ROOT/build_support/tsan-suppressions.txt"
+  TSAN_OPTIONS="$TSAN_OPTIONS history_size=7"
+  export TSAN_OPTIONS
+
+  UBSAN_OPTIONS="$UBSAN_OPTIONS print_stacktrace=1"
+  UBSAN_OPTIONS="$UBSAN_OPTIONS 
suppressions=$ROOT/build_support/ubsan-suppressions.txt"
+  export UBSAN_OPTIONS
+
+  # Enable leak detection even under LLVM 3.4, where it was disabled by 
default.
+  # This flag only takes effect when running an ASAN build.
+  # ASAN_OPTIONS="$ASAN_OPTIONS detect_leaks=1"
+  # export ASAN_OPTIONS
+
+  # Set up suppressions for LeakSanitizer
+  LSAN_OPTIONS="$LSAN_OPTIONS 
suppressions=$ROOT/build_support/lsan-suppressions.txt"
+  export LSAN_OPTIONS
+}
+
+function run_test() {
+  # Run gtest style tests with sanitizers if they are setup appropriately.
+
+  # gtest won't overwrite old junit test files, resulting in a build failure
+  # even when retries are successful.
+  rm -f $XMLFILE
+
+  $TEST_EXECUTABLE "$@" 2>&1 \
+    | ${PYTHON:-python3} $ROOT/build_support/asan_symbolize.py \
+    | ${CXXFILT:-c++filt} \
+    | $ROOT/build_support/stacktrace_addr2line.pl $TEST_EXECUTABLE \
+    | $pipe_cmd 2>&1 | tee $LOGFILE
+  STATUS=$?
+
+  # TSAN doesn't always exit with a non-zero exit code due to a bug:
+  # mutex errors don't get reported through the normal error reporting 
infrastructure.
+  # So we make sure to detect this and exit 1.
+  #
+  # Additionally, certain types of failures won't show up in the standard JUnit
+  # XML output from gtest. We assume that gtest knows better than us and our
+  # regexes in most cases, but for certain errors we delete the resulting xml
+  # file and let our own post-processing step regenerate it.
+  export GREP=$(which egrep)
+  if zgrep --silent "ThreadSanitizer|Leak check.*detected leaks" $LOGFILE ; 
then
+    echo ThreadSanitizer or leak check failures in $LOGFILE
+    STATUS=1
+    rm -f $XMLFILE
+  fi
+}
+
+function print_coredumps() {
+  # The script expects core files relative to the build directory with unique
+  # names per test executable because of the parallel running. So the corefile
+  # patterns must be set with prefix `core.{test-executable}*`:
+  #
+  # In case of macOS:
+  #   sudo sysctl -w kern.corefile=core.%N.%P
+  # On Linux:
+  #   sudo sysctl -w kernel.core_pattern=core.%e.%p
+  #
+  # and the ulimit must be increased:
+  #   ulimit -c unlimited
+
+  # filename is truncated to the first 15 characters in case of linux, so limit
+  # the pattern for the first 15 characters
+  FILENAME=$(basename "${TEST_EXECUTABLE}")
+  FILENAME=$(echo ${FILENAME} | cut -c-15)
+  PATTERN="^core\.${FILENAME}"
+
+  COREFILES=$(ls | grep $PATTERN)
+  if [ -n "$COREFILES" ]; then
+    echo "Found core dump, printing backtrace:"
+
+    for COREFILE in $COREFILES; do
+      # Print backtrace
+      if [ "$(uname)" == "Darwin" ]; then
+        lldb -c "${COREFILE}" --batch --one-line "thread backtrace all -e true"
+      else
+        gdb -c "${COREFILE}" $TEST_EXECUTABLE -ex "thread apply all bt" -ex 
"set pagination 0" -batch
+      fi
+      # Remove the coredump, regenerate it via running the test case directly
+      rm "${COREFILE}"
+    done
+  fi
+}
+
+function post_process_tests() {
+  # If we have a LeakSanitizer report, and XML reporting is configured, add a 
new test
+  # case result to the XML file for the leak report. Otherwise Jenkins won't 
show
+  # us which tests had LSAN errors.
+  if zgrep --silent "ERROR: LeakSanitizer: detected memory leaks" $LOGFILE ; 
then
+      echo Test had memory leaks. Editing XML
+      perl -p -i -e '
+      if (m#</testsuite>#) {
+        print "<testcase name=\"LeakSanitizer\" status=\"run\" 
classname=\"LSAN\">\n";
+        print "  <failure message=\"LeakSanitizer failed\" type=\"\">\n";
+        print "    See txt log file for details\n";
+        print "  </failure>\n";
+        print "</testcase>\n";
+      }' $XMLFILE
+  fi
+}
+
+function run_other() {
+  # Generic run function for test like executables that aren't actually gtest
+  $TEST_EXECUTABLE "$@" 2>&1 | $pipe_cmd > $LOGFILE
+  STATUS=$?
+}
+
+if [ $RUN_TYPE = "test" ]; then
+  setup_sanitizers
+fi
+
+# Run the actual test.
+for ATTEMPT_NUMBER in $(seq 1 $TEST_EXECUTION_ATTEMPTS) ; do
+  if [ $ATTEMPT_NUMBER -lt $TEST_EXECUTION_ATTEMPTS ]; then
+    # If the test fails, the test output may or may not be left behind,
+    # depending on whether the test cleaned up or exited immediately. Either
+    # way we need to clean it up. We do this by comparing the data directory
+    # contents before and after the test runs, and deleting anything new.
+    #
+    # The comm program requires that its two inputs be sorted.
+    TEST_TMPDIR_BEFORE=$(find $TEST_TMPDIR -maxdepth 1 -type d | sort)
+  fi
+
+  if [ $ATTEMPT_NUMBER -lt $TEST_EXECUTION_ATTEMPTS ]; then
+    # Now delete any new test output.
+    TEST_TMPDIR_AFTER=$(find $TEST_TMPDIR -maxdepth 1 -type d | sort)

Review Comment:
   `TEST_TMPDIR_BEFORE` and `TEST_TMPDIR_AFTER` are both captured before 
`run_test`/`run_other` is invoked (the test runs at lines 222–226). Thus `DIFF` 
will always be empty and the cleanup never deletes anything. The "AFTER" 
snapshot and cleanup loop must execute after the test invocation (e.g. at the 
top of the next iteration, after the retry message, or as a post-step).



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script which wraps running a test and redirects its output to a
+# test log directory.
+#
+# Arguments:
+#    $1 - Base path for logs/artifacts.
+#    $2 - type of test (e.g. test or benchmark)
+#    $3 - path to executable
+#    $ARGN - arguments for executable
+#
+
+OUTPUT_ROOT=$1
+shift
+ROOT=$(cd $(dirname $BASH_SOURCE)/..; pwd)
+
+TEST_LOGDIR=$OUTPUT_ROOT/build/$1-logs
+mkdir -p $TEST_LOGDIR
+
+RUN_TYPE=$1
+shift
+TEST_DEBUGDIR=$OUTPUT_ROOT/build/$RUN_TYPE-debug
+mkdir -p $TEST_DEBUGDIR
+
+TEST_DIRNAME=$(cd $(dirname $1); pwd)
+TEST_FILENAME=$(basename $1)
+shift
+TEST_EXECUTABLE="$TEST_DIRNAME/$TEST_FILENAME"
+TEST_NAME=$(echo $TEST_FILENAME | perl -pe 's/\..+?$//') # Remove path and 
extension (if any).
+
+# We run each test in its own subdir to avoid core file related races.
+TEST_WORKDIR=$OUTPUT_ROOT/build/test-work/$TEST_NAME
+mkdir -p $TEST_WORKDIR
+pushd $TEST_WORKDIR >/dev/null || exit 1
+# copy test data before run test
+TEST_DATA_SRC_DIR=$OUTPUT_ROOT/../test/test_data
+TEST_DATA_DST_DIR=$TEST_WORKDIR/test
+mkdir -p $TEST_DATA_DST_DIR
+cp -r $TEST_DATA_SRC_DIR $TEST_DATA_DST_DIR
+rm -f *
+
+set -o pipefail
+
+LOGFILE=$TEST_LOGDIR/$TEST_NAME.txt
+XMLFILE=$TEST_LOGDIR/$TEST_NAME.xml
+
+TEST_EXECUTION_ATTEMPTS=1
+
+# Remove both the uncompressed output, so the developer doesn't accidentally 
get confused
+# and read output from a prior test run.

Review Comment:
   The comment says "Remove both the uncompressed output" but the command 
removes both the uncompressed and compressed (`.gz`) logs. Update the comment 
to "Remove both the uncompressed and compressed output ..." to match the actual 
behavior.
   



##########
build_support/run_clang_tidy.py:
##########
@@ -0,0 +1,149 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import print_function
+import argparse
+from fnmatch import fnmatch
+import multiprocessing as mp
+import lintutils
+from subprocess import PIPE
+import sys
+from functools import partial
+
+
+def _get_chunk_key(filenames):
+    # lists are not hashable so key on the first filename in a chunk
+    return filenames[0]
+
+
+# clang-tidy outputs complaints in '/path:line_number: complaint' format,
+# so we can scan its output to get a list of files to fix
+def _check_some_files(completed_processes, filenames):
+    result = completed_processes[_get_chunk_key(filenames)]
+    return lintutils.stdout_pathcolonline(result, filenames)
+
+
+def _check_all(cmd, filenames):
+    # each clang-tidy instance will process 16 files
+    chunks = lintutils.chunk(filenames, 16)
+    cmds = [cmd + some for some in chunks]
+    results = lintutils.run_parallel(cmds, stderr=PIPE, stdout=PIPE)
+    error = False
+    # record completed processes (keyed by the first filename in the input
+    # chunk) for lookup in _check_some_files
+    completed_processes = {
+        _get_chunk_key(some): result
+        for some, result in zip(chunks, results)
+    }
+    checker = partial(_check_some_files, completed_processes)
+    pool = mp.Pool()
+    try:
+        # check output of completed clang-tidy invocations in parallel
+        for problem_files, stdout in pool.imap(checker, chunks):
+            if problem_files:
+                msg = "clang-tidy suggested fixes for {}"
+                print("\n".join(map(msg.format, problem_files)))
+                print(stdout)
+                error = True
+    except Exception:
+        error = True
+        raise
+    finally:
+        pool.terminate()
+        pool.join()
+
+    if error:
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Runs clang-tidy on all ",
+        fromfile_prefix_chars="@"
+    )
+    parser.add_argument("--clang_tidy_binary",
+                        required=True,
+                        help="Path to the clang-tidy binary")
+    parser.add_argument("--exclude_globs",
+                        help="Filename containing globs for files "
+                        "that should be excluded from the checks")
+    parser.add_argument("--compile_commands",
+                        required=True,
+                        help="compile_commands.json to pass clang-tidy")
+    path_group = parser.add_mutually_exclusive_group(required=True)
+    path_group.add_argument("--source_dir",
+                            nargs="+",
+                            help="Root directory(s) of the source code")
+    path_group.add_argument("--source_file",
+                            nargs="*",
+                            help="file path(s) of every source code")

Review Comment:
   The help text "file path(s) of every source code" is grammatically awkward. 
Suggest "Path(s) to individual source files to check."
   



##########
build_support/lintutils.py:
##########
@@ -0,0 +1,119 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import multiprocessing as mp
+import os
+import re
+from fnmatch import fnmatch
+from subprocess import Popen
+
+
+def chunk(seq, n):
+    """
+    divide a sequence into equal sized chunks
+    (the last chunk may be smaller, but won't be empty)
+    """
+    chunks = []
+    some = []
+    for element in seq:
+        if len(some) == n:
+            chunks.append(some)
+            some = []
+        some.append(element)
+    if len(some) > 0:
+        chunks.append(some)
+    return chunks
+
+
+def dechunk(chunks):
+    "flatten chunks into a single list"
+    seq = []
+    for chunk in chunks:
+        seq.extend(chunk)
+    return seq
+
+
+def run_parallel(cmds, **kwargs):
+    """
+    Run each of cmds (with shared **kwargs) using subprocess.Popen
+    then wait for all of them to complete.
+    Runs batches of multiprocessing.cpu_count() * 2 from cmds
+    returns a list of tuples containing each process'
+    returncode, stdout, stderr
+    """
+    complete = []
+    for cmds_batch in chunk(cmds, mp.cpu_count() * 2):
+        procs_batch = [Popen(cmd, **kwargs) for cmd in cmds_batch]
+        for proc in procs_batch:
+            stdout, stderr = proc.communicate()
+            complete.append((proc.returncode, stdout, stderr))
+    return complete
+
+
+_source_extensions = '''
+.h
+.cc
+.cpp
+.c
+'''.split()
+
+
+def need_do_lint(path, exclude_globs=[]):
+    # filter out non-source files
+    if os.path.splitext(path)[1] not in _source_extensions:
+        return False
+    # filter out files that match the globs in the globs file
+    if any([fnmatch(path, glob) for glob in exclude_globs]):
+        return False
+    return True
+
+
+def get_sources(source_dir, exclude_globs=[]):
+    sources = []
+    for directory, subdirs, basenames in os.walk(source_dir):
+        for path in [os.path.join(directory, basename)
+                     for basename in basenames]:
+            path = os.path.abspath(path)
+            if (need_do_lint(path, exclude_globs)):
+                sources.append(path)
+    return sources
+
+
+def _remove_color(colored_text):
+    ansi_re = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
+    clean_text = ansi_re.sub('', colored_text.decode('utf-8'))
+    return clean_text
+
+
+def stdout_pathcolonline(completed_process, filenames):
+    """
+    given a completed process which may have reported some files as problematic
+    by printing the path name followed by ':' then a line number, examine
+    stdout and return the set of actually reported file names
+    """
+    returncode, stdout, stderr = completed_process
+    bfilenames = set()
+    for filename in filenames:
+        bfilenames.add(filename + ":")
+    problem_files = set()
+    for line in _remove_color(stdout).splitlines():
+        for filename in bfilenames:
+            if line.startswith(filename):
+                problem_files.add(filename)
+                bfilenames.remove(filename)
+                break

Review Comment:
   Modifying `bfilenames` while iterating over it via `for filename in 
bfilenames` raises `RuntimeError: Set changed size during iteration` in 
CPython. Iterate over a snapshot (e.g. `for filename in list(bfilenames):`) or 
collect removals and apply after the inner loop.



##########
build_support/run-test.sh:
##########
@@ -0,0 +1,244 @@
+#!/bin/bash
+# Copyright 2014 Cloudera, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#

Review Comment:
   This file (and `stacktrace_addr2line.pl`) carries a "Copyright 2014 
Cloudera, Inc." header while all other newly added files use the standard 
Apache Software Foundation license header. For consistency across this PR, 
consider updating these headers to match the ASF header used in the rest of 
`build_support/`.
   



##########
build_support/iwyu/iwyu_tool.py:
##########
@@ -0,0 +1,280 @@
+#!/usr/bin/env python
+
+# This file has been imported into the apache source tree from
+# the IWYU source tree as of version 0.8
+#   
https://github.com/include-what-you-use/include-what-you-use/blob/master/iwyu_tool.py
+# and corresponding license has been added:
+#   
https://github.com/include-what-you-use/include-what-you-use/blob/master/LICENSE.TXT
+#
+# 
==============================================================================
+# LLVM Release License
+# 
==============================================================================
+# University of Illinois/NCSA
+# Open Source License
+#
+# Copyright (c) 2003-2010 University of Illinois at Urbana-Champaign.
+# All rights reserved.
+#
+# Developed by:
+#
+#     LLVM Team
+#
+#     University of Illinois at Urbana-Champaign
+#
+#     http://llvm.org
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy 
of
+# this software and associated documentation files (the "Software"), to deal 
with
+# the Software without restriction, including without limitation the rights to
+# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+# of the Software, and to permit persons to whom the Software is furnished to 
do
+# so, subject to the following conditions:
+#
+#     * Redistributions of source code must retain the above copyright notice,
+#       this list of conditions and the following disclaimers.
+#
+#     * Redistributions in binary form must reproduce the above copyright 
notice,
+#       this list of conditions and the following disclaimers in the
+#       documentation and/or other materials provided with the distribution.
+#
+#     * Neither the names of the LLVM Team, University of Illinois at
+#       Urbana-Champaign, nor the names of its contributors may be used to
+#       endorse or promote products derived from this Software without specific
+#       prior written permission.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS
+# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+# CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH 
THE
+# SOFTWARE.
+
+""" Driver to consume a Clang compilation database and invoke IWYU.
+
+Example usage with CMake:
+
+  # Unix systems
+  $ mkdir build && cd build
+  $ CC="clang" CXX="clang++" cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ...
+  $ iwyu_tool.py -p .
+
+  # Windows systems
+  $ mkdir build && cd build
+  $ cmake -DCMAKE_CXX_COMPILER="%VCINSTALLDIR%/bin/cl.exe" \
+    -DCMAKE_C_COMPILER="%VCINSTALLDIR%/VC/bin/cl.exe" \
+    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
+    -G Ninja ...
+  $ python iwyu_tool.py -p .
+
+See iwyu_tool.py -h for more details on command-line arguments.
+"""
+
+import os
+import sys
+import json
+import argparse
+import subprocess
+import re
+
+import logging
+
+logging.basicConfig(filename='iwyu.log')
+LOGGER = logging.getLogger("iwyu")
+
+
+def iwyu_formatter(output):
+    """ Process iwyu's output, basically a no-op. """
+    print('\n'.join(output))
+
+
+CORRECT_RE = re.compile(r'^\((.*?) has correct #includes/fwd-decls\)$')
+SHOULD_ADD_RE = re.compile(r'^(.*?) should add these lines:$')
+SHOULD_REMOVE_RE = re.compile(r'^(.*?) should remove these lines:$')
+FULL_LIST_RE = re.compile(r'The full include-list for (.*?):$')
+END_RE = re.compile(r'^---$')
+LINES_RE = re.compile(r'^- (.*?)  // lines ([0-9]+)-[0-9]+$')
+
+
+GENERAL, ADD, REMOVE, LIST = range(4)
+
+
+def clang_formatter(output):
+    """ Process iwyu's output into something clang-like. """
+    state = (GENERAL, None)
+    for line in output:
+        match = CORRECT_RE.match(line)
+        if match:
+            print('%s:1:1: note: #includes/fwd-decls are correct', 
match.groups(1))

Review Comment:
   This (and the similar calls at lines 128, 133, 168, 182, 218) uses a comma 
instead of the `%` operator, so the format specifier is not substituted and the 
tuple/string is printed as a separate argument. Use `print('%s:1:1: note: 
#includes/fwd-decls are correct' % match.group(1))` (and likewise 
`match.group(1)` rather than `match.groups(1)`, which returns a tuple).



##########
build_support/run_clang_tidy.py:
##########
@@ -0,0 +1,149 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import print_function
+import argparse
+from fnmatch import fnmatch
+import multiprocessing as mp
+import lintutils
+from subprocess import PIPE
+import sys
+from functools import partial
+
+
+def _get_chunk_key(filenames):
+    # lists are not hashable so key on the first filename in a chunk
+    return filenames[0]
+
+
+# clang-tidy outputs complaints in '/path:line_number: complaint' format,
+# so we can scan its output to get a list of files to fix
+def _check_some_files(completed_processes, filenames):
+    result = completed_processes[_get_chunk_key(filenames)]
+    return lintutils.stdout_pathcolonline(result, filenames)
+
+
+def _check_all(cmd, filenames):
+    # each clang-tidy instance will process 16 files
+    chunks = lintutils.chunk(filenames, 16)
+    cmds = [cmd + some for some in chunks]
+    results = lintutils.run_parallel(cmds, stderr=PIPE, stdout=PIPE)
+    error = False
+    # record completed processes (keyed by the first filename in the input
+    # chunk) for lookup in _check_some_files
+    completed_processes = {
+        _get_chunk_key(some): result
+        for some, result in zip(chunks, results)
+    }
+    checker = partial(_check_some_files, completed_processes)
+    pool = mp.Pool()
+    try:
+        # check output of completed clang-tidy invocations in parallel
+        for problem_files, stdout in pool.imap(checker, chunks):
+            if problem_files:
+                msg = "clang-tidy suggested fixes for {}"
+                print("\n".join(map(msg.format, problem_files)))
+                print(stdout)
+                error = True
+    except Exception:
+        error = True
+        raise
+    finally:
+        pool.terminate()
+        pool.join()
+
+    if error:
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Runs clang-tidy on all ",

Review Comment:
   The description string is incomplete ("Runs clang-tidy on all "). Complete 
the sentence, e.g. "Runs clang-tidy on all specified source files."
   



##########
build_support/asan_symbolize.py:
##########
@@ -0,0 +1,368 @@
+#!/usr/bin/env python
+#===- lib/asan/scripts/asan_symbolize.py 
-----------------------------------===#
+#
+#                     The LLVM Compiler Infrastructure
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+#===------------------------------------------------------------------------===#
+import bisect
+import os
+import re
+import subprocess
+import sys
+
+llvm_symbolizer = None
+symbolizers = {}
+filetypes = {}
+vmaddrs = {}
+DEBUG = False
+
+
+# FIXME: merge the code that calls fix_filename().
+def fix_filename(file_name):
+  for path_to_cut in sys.argv[1:]:
+    file_name = re.sub('.*' + path_to_cut, '', file_name)
+  file_name = re.sub('.*asan_[a-z_]*.cc:[0-9]*', '_asan_rtl_', file_name)
+  file_name = re.sub('.*crtstuff.c:0', '???:0', file_name)
+  return file_name
+
+
+class Symbolizer(object):
+  def __init__(self):
+    pass
+
+  def symbolize(self, addr, binary, offset):
+    """Symbolize the given address (pair of binary and offset).
+
+    Overridden in subclasses.
+    Args:
+        addr: virtual address of an instruction.
+        binary: path to executable/shared object containing this instruction.
+        offset: instruction offset in the @binary.
+    Returns:
+        list of strings (one string for each inlined frame) describing
+        the code locations for this instruction (that is, function name, file
+        name, line and column numbers).
+    """
+    return None
+
+
+class LLVMSymbolizer(Symbolizer):
+  def __init__(self, symbolizer_path):
+    super(LLVMSymbolizer, self).__init__()
+    self.symbolizer_path = symbolizer_path
+    self.pipe = self.open_llvm_symbolizer()
+
+  def open_llvm_symbolizer(self):
+    if not os.path.exists(self.symbolizer_path):
+      return None
+    cmd = [self.symbolizer_path,
+           '--use-symbol-table=true',
+           '--demangle=false',
+           '--functions=true',
+           '--inlining=true']
+    if DEBUG:
+      print(' '.join(cmd))
+    return subprocess.Popen(cmd, stdin=subprocess.PIPE,
+                            stdout=subprocess.PIPE)
+
+  def symbolize(self, addr, binary, offset):
+    """Overrides Symbolizer.symbolize."""
+    if not self.pipe:
+      return None
+    result = []
+    try:
+      symbolizer_input = '%s %s' % (binary, offset)
+      if DEBUG:
+        print(symbolizer_input)
+      self.pipe.stdin.write(symbolizer_input)
+      self.pipe.stdin.write('\n')
+      while True:
+        function_name = self.pipe.stdout.readline().rstrip()
+        if not function_name:
+          break
+        file_name = self.pipe.stdout.readline().rstrip()
+        file_name = fix_filename(file_name)
+        if (not function_name.startswith('??') and
+            not file_name.startswith('??')):
+          # Append only valid frames.
+          result.append('%s in %s %s' % (addr, function_name,
+                                         file_name))
+    except Exception:
+      result = []
+    if not result:
+      result = None
+    return result
+
+
+def LLVMSymbolizerFactory(system):
+  symbolizer_path = os.getenv('LLVM_SYMBOLIZER_PATH')
+  if not symbolizer_path:
+    # Assume llvm-symbolizer is in PATH.
+    symbolizer_path = 'llvm-symbolizer'
+  return LLVMSymbolizer(symbolizer_path)
+
+
+class Addr2LineSymbolizer(Symbolizer):
+  def __init__(self, binary):
+    super(Addr2LineSymbolizer, self).__init__()
+    self.binary = binary
+    self.pipe = self.open_addr2line()
+
+  def open_addr2line(self):
+    cmd = ['addr2line', '-f', '-e', self.binary]
+    if DEBUG:
+      print(' '.join(cmd))
+    return subprocess.Popen(cmd,
+                            stdin=subprocess.PIPE, stdout=subprocess.PIPE)
+
+  def symbolize(self, addr, binary, offset):
+    """Overrides Symbolizer.symbolize."""
+    if self.binary != binary:
+      return None
+    try:
+      self.pipe.stdin.write(offset)
+      self.pipe.stdin.write('\n')
+      function_name = self.pipe.stdout.readline().rstrip()
+      file_name = self.pipe.stdout.readline().rstrip()
+    except Exception:
+      function_name = ''
+      file_name = ''
+    file_name = fix_filename(file_name)
+    return ['%s in %s %s' % (addr, function_name, file_name)]
+
+
+class DarwinSymbolizer(Symbolizer):
+  def __init__(self, addr, binary):
+    super(DarwinSymbolizer, self).__init__()
+    self.binary = binary
+    # Guess which arch we're running. 10 = len('0x') + 8 hex digits.
+    if len(addr) > 10:
+      self.arch = 'x86_64'
+    else:
+      self.arch = 'i386'
+    self.vmaddr = None
+    self.pipe = None
+
+  def write_addr_to_pipe(self, offset):
+    self.pipe.stdin.write('0x%x' % int(offset, 16))
+    self.pipe.stdin.write('\n')
+
+  def open_atos(self):
+    if DEBUG:
+      print('atos -o %s -arch %s' % (self.binary, self.arch))
+    cmdline = ['atos', '-o', self.binary, '-arch', self.arch]
+    self.pipe = subprocess.Popen(cmdline,
+                                 stdin=subprocess.PIPE,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.PIPE)
+
+  def symbolize(self, addr, binary, offset):
+    """Overrides Symbolizer.symbolize."""
+    if self.binary != binary:
+      return None
+    self.open_atos()
+    self.write_addr_to_pipe(offset)
+    self.pipe.stdin.close()
+    atos_line = self.pipe.stdout.readline().rstrip()
+    # A well-formed atos response looks like this:
+    #   foo(type1, type2) (in object.name) (filename.cc:80)
+    match = re.match('^(.*) \(in (.*)\) \((.*:\d*)\)$', atos_line)
+    if DEBUG:
+      print('atos_line: {0}'.format(atos_line))
+    if match:
+      function_name = match.group(1)
+      function_name = re.sub('\(.*?\)', '', function_name)
+      file_name = fix_filename(match.group(3))
+      return ['%s in %s %s' % (addr, function_name, file_name)]
+    else:
+      return ['%s in %s' % (addr, atos_line)]
+
+
+# Chain several symbolizers so that if one symbolizer fails, we fall back
+# to the next symbolizer in chain.
+class ChainSymbolizer(Symbolizer):
+  def __init__(self, symbolizer_list):
+    super(ChainSymbolizer, self).__init__()
+    self.symbolizer_list = symbolizer_list
+
+  def symbolize(self, addr, binary, offset):
+    """Overrides Symbolizer.symbolize."""
+    for symbolizer in self.symbolizer_list:
+      if symbolizer:
+        result = symbolizer.symbolize(addr, binary, offset)
+        if result:
+          return result
+    return None
+
+  def append_symbolizer(self, symbolizer):
+    self.symbolizer_list.append(symbolizer)
+
+
+def BreakpadSymbolizerFactory(binary):
+  suffix = os.getenv('BREAKPAD_SUFFIX')
+  if suffix:
+    filename = binary + suffix
+    if os.access(filename, os.F_OK):
+      return BreakpadSymbolizer(filename)
+  return None
+
+
+def SystemSymbolizerFactory(system, addr, binary):
+  if system == 'Darwin':
+    return DarwinSymbolizer(addr, binary)
+  elif system == 'Linux':
+    return Addr2LineSymbolizer(binary)
+
+
+class BreakpadSymbolizer(Symbolizer):
+  def __init__(self, filename):
+    super(BreakpadSymbolizer, self).__init__()
+    self.filename = filename
+    lines = file(filename).readlines()

Review Comment:
   The built-in `file` was removed in Python 3 and will raise `NameError` if 
`BreakpadSymbolizer` is ever instantiated. Replace with `with open(filename) as 
f: lines = f.readlines()`.
   



##########
build_support/run_clang_format.py:
##########
@@ -0,0 +1,137 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import print_function
+import lintutils
+from subprocess import PIPE
+import argparse
+import difflib
+import multiprocessing as mp
+import sys
+from functools import partial
+
+
+# examine the output of clang-format and if changes are
+# present assemble a (unified)patch of the difference
+def _check_one_file(filename, formatted):
+    with open(filename, "rb") as reader:
+        original = reader.read()
+
+    if formatted != original:
+        # Run the equivalent of diff -u
+        diff = list(difflib.unified_diff(
+            original.decode('utf8').splitlines(True),
+            formatted.decode('utf8').splitlines(True),
+            fromfile=filename,
+            tofile="{} (after clang format)".format(
+                filename)))
+    else:
+        diff = None
+
+    return filename, diff
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Runs clang-format on all of the source "
+        "files. If --fix is specified enforce format by "
+        "modifying in place, otherwise compare the output "
+        "with the existing file and output any necessary "
+        "changes as a patch in unified diff format")
+    parser.add_argument("--clang_format_binary",
+                        required=True,
+                        help="Path to the clang-format binary")
+    parser.add_argument("--exclude_globs",
+                        help="Filename containing globs for files "
+                        "that should be excluded from the checks")
+    parser.add_argument("--source_dir",
+                        required=True,
+                        help="Root directory of the source code")
+    parser.add_argument("--fix", default=False,
+                        action="store_true",
+                        help="If specified, will re-format the source "
+                        "code instead of comparing the re-formatted "
+                        "output, defaults to %(default)s")
+    parser.add_argument("--quiet", default=False,
+                        action="store_true",
+                        help="If specified, only print errors")
+    arguments = parser.parse_args()
+
+    exclude_globs = []
+    if arguments.exclude_globs:
+        with open(arguments.exclude_globs) as f:
+            exclude_globs.extend(line.strip() for line in f)
+
+    formatted_filenames = []
+    for path in lintutils.get_sources(arguments.source_dir, exclude_globs):
+        formatted_filenames.append(str(path))
+
+    if arguments.fix:
+        if not arguments.quiet:
+            print("\n".join(map(lambda x: "Formatting {}".format(x),
+                                formatted_filenames)))
+
+        # Break clang-format invocations into chunks: each invocation formats
+        # 16 files. Wait for all processes to complete
+        results = lintutils.run_parallel([
+            [arguments.clang_format_binary, "-i"] + some
+            for some in lintutils.chunk(formatted_filenames, 16)
+        ])
+        for returncode, stdout, stderr in results:
+            # if any clang-format reported a parse error, bubble it
+            if returncode != 0:
+                sys.exit(returncode)
+
+    else:
+        # run an instance of clang-format for each source file in parallel,
+        # then wait for all processes to complete
+        results = lintutils.run_parallel([
+            [arguments.clang_format_binary, filename]
+            for filename in formatted_filenames
+        ], stdout=PIPE, stderr=PIPE)
+
+        checker_args = []
+        for filename, res in zip(formatted_filenames, results):
+            # if any clang-format reported a parse error, bubble it
+            returncode, stdout, stderr = res
+            if returncode != 0:
+                print(stderr)
+                sys.exit(returncode)
+            checker_args.append((filename, stdout))
+
+        error = False
+        pool = mp.Pool()
+        try:
+            # check the output from each invocation of clang-format in parallel
+            for filename, diff in pool.starmap(_check_one_file, checker_args):
+                if not arguments.quiet:
+                    print("Checking {}".format(filename))
+                if diff:
+                    print("{} had clang-format style issues".format(filename))
+                    # Print out the diff to stderr
+                    error = True
+                    # pad with a newline
+                    print(file=sys.stderr)
+                    sys.stderr.writelines(diff)
+        except Exception:
+            error = True
+            raise
+        finally:
+            pool.terminate()
+            pool.join()
+        sys.exit(1 if error else 0)

Review Comment:
   `sys.exit(1 if error else 0)` is inside the `else:` branch from line 100 but 
at the top indentation level of the `else` block — that's correct — however the 
`--fix` branch (lines 84–98) does not call `sys.exit(0)` on success, so it 
falls through and then implicitly returns 0 only because no further code runs. 
That's fine today but worth noting: any code added below the `else:` block 
would silently run in fix mode. Consider an explicit `sys.exit(0)` after the 
`--fix` success path for clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(build): migrate build support scripts [paimon-cpp]

Reply via email to