alamb opened a new pull request #9264: URL: https://github.com/apache/arrow/pull/9264
Note this PR needs the code from https://github.com/apache/arrow/pull/9263 to pass, so marking as a draft until that is complete The `test::format_batch` function does not have wide range of type support (e.g. it doesn't support dictionaries) and its output makes tests hard to read / update, in my opinion. This PR consolidates the datafusion tests to use `arrow::util::pretty::pretty_format_batches` both to reduce code duplication as well as increase type support This PR removes the `test::format_batch(&batch);` function and replaces it with `arrow::util::pretty::pretty_format_batches` and some macros. It has no code changes. This change the following benefits: 1. Better type support (I immediately can compare RecordBatches with `Dictionary` types in tests without having to update `format_batch` and https://github.com/apache/arrow/pull/9233 gets simpler) 2. Better readability and error reporting (at least I find the code and diffs easier to understand) 3. Easier test update / review: it is easier to update the diffs (you can copy/paste the test output into the source code) and to review them This is a variant of a strategy that I been using with success in IOx [source link](https://github.com/influxdata/influxdb_iox/blob/main/arrow_deps/src/test_util.rs#L15) and I wanted to contribute it back. An example failure with this PR: ``` ---- physical_plan::hash_join::tests::join_left_one stdout ---- thread 'physical_plan::hash_join::tests::join_left_one' panicked at 'assertion failed: `(left == right)` left: `["+----+----+----+----+", "| a1 | b2 | c1 | c2 |", "+----+----+----+----+", "| 1 | 1 | 7 | 70 |", "| 2 | 2 | 8 | 80 |", "| 2 | 2 | 9 | 80 |", "+----+----+----+----+"]`, right: `["+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | c2 |", "+----+----+----+----+----+", "| 1 | 4 | 7 | 10 | 70 |", "| 2 | 5 | 8 | 20 | 80 |", "| 3 | 7 | 9 | | |", "+----+----+----+----+----+"]`: expected: [ "+----+----+----+----+", "| a1 | b2 | c1 | c2 |", "+----+----+----+----+", "| 1 | 1 | 7 | 70 |", "| 2 | 2 | 8 | 80 |", "| 2 | 2 | 9 | 80 |", "+----+----+----+----+", ] actual: [ "+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | c2 |", "+----+----+----+----+----+", "| 1 | 4 | 7 | 10 | 70 |", "| 2 | 5 | 8 | 20 | 80 |", "| 3 | 7 | 9 | | |", "+----+----+----+----+----+", ] ``` You can copy/paste the output of `actual` directly into the test code for an update. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org