alamb commented on a change in pull request #8996:
URL: https://github.com/apache/arrow/pull/8996#discussion_r548523886
##########
File path: rust/parquet/src/util/test_common/file_util.rs
##########
@@ -19,17 +19,8 @@ use std::{env, fs, io::Write, path::PathBuf, str::FromStr};
/// Returns path to the test parquet file in 'data' directory
pub fn get_test_path(file_name: &str) -> PathBuf {
- let mut pathbuf = match env::var("PARQUET_TEST_DATA") {
- Ok(path) => PathBuf::from_str(path.as_str()).unwrap(),
- Err(_) => {
- let mut pathbuf = env::current_dir().unwrap();
- pathbuf.pop();
- pathbuf.pop();
- pathbuf
-
.push(PathBuf::from_str("cpp/submodules/parquet-testing/data").unwrap());
- pathbuf
- }
- };
+ let mut pathbuf =
+
PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap();
Review comment:
@nevi-me and @mqy -- I tried to move `parquet_test_data` into
https://github.com/apache/arrow/blob/master/rust/parquet/src/util/test_common/file_util.rs
-- however, the code quickly got messy because `parquet::util` is not
publically exported and thus I can't use functions defined there in places
(like datafusion) outside the parquet crate. Furthermore, the `test_utils` are
only compiled in `test` config, but several datafusion examples use the parquet
test data but they are not compiled in `test` config.
I can think of several possibilities:
1. Leave the `parquet_test_data` function in the arrow crate as it is in
this PR
2. Make a copy of parquet_test_data in the parquet crate
3. Make the parquet util module public and export test_util in all
configurations
Given that this function is used in tests and the other options seem messy
to me, I suggest number 1 (though perhaps I am being lazy)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]