[
https://issues.apache.org/jira/browse/ARROW-10967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
meng qingyou updated ARROW-10967:
---------------------------------
Description:
# Two env vars
*ARROW_TEST_DATA* and *PARQUET_TEST_DATA* are required to be set, for running
tests, benchmark.
# The major usage likes this:
{code:java}
let testdata = std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not
defined"); {code}
# These already exist some codes that tried to assembly the test data
directories by appending relative dir to *current dir* of current running
process.
So it would be better if add several public utility functions for getting test
data dir. Basic design is:
If env is defined and the value points to existing dir, then we get it;
Else try getting the data dir based on: current dir, default relative dir, etc.
was:
Facts/problems:
# Two env vars
*ARROW_TEST_DATA* and *PARQUET_TEST_DATA* are required to be set, for running
tests, benchmarks, examples.
# There are totally eighteen .rs files using these environment variables.
# The major usage likes this: ```
let testdata =
std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA not defined");```
# Somebody tried to assembly the test data directories by appending relative
dir to *current dir* of current running process, but that MAY highly depend on
the actual current dir (for example, rust/, rust/datafusion, etc.).
Here is my solution:
Suppose:
# *current_dir* is *ALWAYS* inside the *git workspace dir*
# We know an *data dir X relative to git workspace dir*
Get absolute dir of *X* == get absolute dir of *git workspace dir*.
Given *current dir* (in *git workspace dir*),we visit the dir and it's parents,
check if ."git" (file or dir)exists. The first dir that contains ".git" SHOULD
be *git workspace dir*.
> [Rust] Make env vars ARROW_TEST_DATA and PARQUET_TEST_DATA optional
> -------------------------------------------------------------------
>
> Key: ARROW-10967
> URL: https://issues.apache.org/jira/browse/ARROW-10967
> Project: Apache Arrow
> Issue Type: Test
> Reporter: meng qingyou
> Assignee: meng qingyou
> Priority: Minor
>
> # Two env vars
> *ARROW_TEST_DATA* and *PARQUET_TEST_DATA* are required to be set, for
> running tests, benchmark.
> # The major usage likes this:
> {code:java}
> let testdata = std::env::var("PARQUET_TEST_DATA").expect("PARQUET_TEST_DATA
> not defined"); {code}
> # These already exist some codes that tried to assembly the test data
> directories by appending relative dir to *current dir* of current running
> process.
> So it would be better if add several public utility functions for getting
> test data dir. Basic design is:
> If env is defined and the value points to existing dir, then we get it;
> Else try getting the data dir based on: current dir, default relative dir,
> etc.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)