alamb opened a new issue #467:
URL: https://github.com/apache/arrow-datafusion/issues/467


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   As someone new to datafusion it may not be clear that to run the tests 
successfully you need to set `PARQUET_TEST_DATA` and `ARROW_TEST_DATA` 
environment variables
   
   So today, here is what happens:
   ```
   git clone https://github.com/apache/arrow-datafusion
   cd arrow-datafusion
   cargo test -p datafusion
   ```
   
   Which results in many errors like:
   ```
   ---- physical_plan::windows::tests::window_function_input_partition stdout 
----
   thread 'physical_plan::windows::tests::window_function_input_partition' 
panicked at 'failed to get arrow data dir: env `ARROW_TEST_DATA` is undefined 
or has empty value, and the pre-defined data dir 
`/Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-4.2.0/../testing/data`
 not found
   HINT: try running `git submodule update --init`', 
/Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-4.2.0/src/util/test_util.rs:81:21
   ```
   
   And even when you do as suggested `git submodule update --init` it does not 
work. Instead, you need to set :
   ```
   export ARROW_TEST_DATA=testing/data
   export PARQUET_TEST_DATA=parquet-testing/data
   cargo test -p datafusion
   ```
   
   **Describe the solution you'd like**
   I would like the tests to automatically try the default locations, as above, 
if `ARROW_TEST_DATA` and `PARQUET_TEST_DATA` are set.
   
   The tests should pass successfully with only these commands:
   ```
   git clone https://github.com/apache/arrow-datafusion
   cd arrow-datafusion
   git submodule update --init
   cargo test -p datafusion
   ```
   
   The arrow-rs crate already does this 
([here](https://github.com/apache/arrow-rs/blob/master/arrow/src/util/test_util.rs#L100)
 and 
[here](https://github.com/apache/arrow-rs/blob/master/arrow/src/util/test_util.rs#L78`):
  but now that we no longer have arrow-rs and datafusion in the same workspace 
it stopped working
   
   Perhaps we can simply take the code from arrow-rs and port it to run in 
datafusion rather than calling arrow::util::test_util
   
   **Describe alternatives you've considered**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to