alamb opened a new pull request #9264:
URL: https://github.com/apache/arrow/pull/9264


   Note this PR needs the code from  https://github.com/apache/arrow/pull/9263 
to pass, so marking as a draft until that is complete
   
   The `test::format_batch` function does not have wide range of type support 
(e.g. it doesn't support dictionaries) and its output makes tests hard to read 
/ update, in my opinion. This PR consolidates the datafusion tests to use 
`arrow::util::pretty::pretty_format_batches` both to reduce code duplication as 
well as increase type support
   
   This PR removes the  `test::format_batch(&batch);` function and replaces it 
with `arrow::util::pretty::pretty_format_batches` and some macros. It has no 
code changes.
   
   This change the following benefits:
   
   1. Better type support (I immediately can compare RecordBatches with 
`Dictionary` types in tests without having to update `format_batch` and 
https://github.com/apache/arrow/pull/9233 gets simpler)
   2. Better readability and error reporting (at least I find the code and 
diffs easier to understand)
   3. Easier test update / review: it is easier to update the diffs (you can 
copy/paste the test output into the source code) and to review them
   
   This is a variant of a strategy that I been using with success in IOx 
[source 
link](https://github.com/influxdata/influxdb_iox/blob/main/arrow_deps/src/test_util.rs#L15)
 and I wanted to contribute it back.
   
   An example failure with this PR:
   
   ```
   ---- physical_plan::hash_join::tests::join_left_one stdout ----
   thread 'physical_plan::hash_join::tests::join_left_one' panicked at 
'assertion failed: `(left == right)`
     left: `["+----+----+----+----+", "| a1 | b2 | c1 | c2 |", 
"+----+----+----+----+", "| 1  | 1  | 7  | 70 |", "| 2  | 2  | 8  | 80 |", "| 2 
 | 2  | 9  | 80 |", "+----+----+----+----+"]`,
    right: `["+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | c2 |", 
"+----+----+----+----+----+", "| 1  | 4  | 7  | 10 | 70 |", "| 2  | 5  | 8  | 
20 | 80 |", "| 3  | 7  | 9  |    |    |", "+----+----+----+----+----+"]`:
   
   expected:
   
   [
       "+----+----+----+----+",
       "| a1 | b2 | c1 | c2 |",
       "+----+----+----+----+",
       "| 1  | 1  | 7  | 70 |",
       "| 2  | 2  | 8  | 80 |",
       "| 2  | 2  | 9  | 80 |",
       "+----+----+----+----+",
   ]
   actual:
   
   [
       "+----+----+----+----+----+",
       "| a1 | b1 | c1 | a2 | c2 |",
       "+----+----+----+----+----+",
       "| 1  | 4  | 7  | 10 | 70 |",
       "| 2  | 5  | 8  | 20 | 80 |",
       "| 3  | 7  | 9  |    |    |",
       "+----+----+----+----+----+",
   ]
   ```
   
   You can copy/paste the output of `actual` directly into the test code for an 
update. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to