(incubator-graphar) branch main updated: Feat(CI): Use environment variable to specify the location of testing data (#512)

weibin Tue, 04 Jun 2024 19:24:17 -0700

This is an automated email from the ASF dual-hosted git repository.

weibin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git



The following commit(s) were added to refs/heads/main by this push:
     new 8b315a7  Feat(CI): Use  environment variable to specify the location 
of testing data (#512)
8b315a7 is described below

commit 8b315a757473a6cdd21b62f1c667efa1bbdf4cf1
Author: Weibin Zeng <[email protected]>
AuthorDate: Wed Jun 5 10:24:06 2024 +0800

    Feat(CI): Use  environment variable to specify the location of testing data 
(#512)
    
    
    Signed-off-by: acezen <[email protected]>
---
 .github/workflows/ci.yml                           |  7 ++-
 .github/workflows/spark.yaml                       |  6 +++
 cpp/README.md                                      | 12 ++++--
 cpp/benchmarks/benchmark_util.h                    | 17 +++-----
 cpp/examples/bfs_father_example.cc                 |  2 +-
 cpp/examples/bfs_pull_example.cc                   |  2 +-
 cpp/examples/bfs_push_example.cc                   |  2 +-
 cpp/examples/bfs_stream_example.cc                 |  2 +-
 cpp/examples/bgl_example.cc                        |  2 +-
 cpp/examples/cc_push_example.cc                    |  2 +-
 cpp/examples/cc_stream_example.cc                  |  2 +-
 cpp/examples/config.h                              | 17 ++++----
 cpp/examples/high_level_reader_example.cc          |  2 +-
 cpp/examples/high_level_writer_example.cc          |  7 +--
 cpp/examples/low_level_reader_example.cc           |  2 +-
 cpp/examples/mid_level_reader_example.cc           |  2 +-
 cpp/examples/mid_level_writer_example.cc           |  8 ++--
 cpp/examples/pagerank_example.cc                   |  2 +-
 cpp/test/test_arrow_chunk_reader.cc                |  6 +--
 cpp/test/test_arrow_chunk_writer.cc                | 14 +++---
 cpp/test/test_builder.cc                           | 16 +++----
 cpp/test/test_chunk_info_reader.cc                 | 44 +++++++++----------
 cpp/test/test_graph.cc                             |  8 ++--
 cpp/test/test_info.cc                              | 18 ++++----
 cpp/test/util.h                                    | 32 +++++++++-----
 maven-projects/spark/README.md                     | 18 ++++++--
 .../spark/graphar/src/test/resources/gar-test      |  1 -
 .../scala/org/apache/graphar/BaseTestSuite.scala   | 40 ++++++++++-------
 .../scala/org/apache/graphar/ComputeExample.scala  | 20 ++-------
 .../scala/org/apache/graphar/TestGraphInfo.scala   | 27 +++---------
 .../scala/org/apache/graphar/TestGraphReader.scala | 18 ++------
 .../org/apache/graphar/TestGraphTransformer.scala  | 25 +++--------
 .../scala/org/apache/graphar/TestGraphWriter.scala | 18 ++------
 .../org/apache/graphar/TestIndexGenerator.scala    | 26 +++--------
 .../test/scala/org/apache/graphar/TestReader.scala | 41 ++++--------------
 .../test/scala/org/apache/graphar/TestWriter.scala | 50 ++++++----------------
 .../org/apache/graphar/TransformExample.scala      | 32 +++-----------
 .../spark/scripts/run-ldbc-sample2graphar.sh       |  4 +-
 38 files changed, 226 insertions(+), 328 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index dbf3e0e..7f323b2 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -42,7 +42,7 @@ jobs:
     name: Ubuntu 22.04 C++
     runs-on: ubuntu-latest
     env:
-      GAR_TEST_DATA: ${{ github.workspace }}/testing/
+      GAR_TEST_DATA: ${{ github.workspace }}/graphar-testing/
     steps:
     - uses: actions/checkout@v3
       with:
@@ -76,6 +76,8 @@ jobs:
         sudo cmake --build build/ --target install
         popd
 
+        git clone https://github.com/apache/incubator-graphar-testing.git 
$GAR_TEST_DATA --depth 1
+
     - name: CMake
       working-directory: "cpp"
       run: |
@@ -197,6 +199,8 @@ jobs:
   macos:
     name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} C++
     runs-on: macos-${{ matrix.macos-version }}
+    env:
+      GAR_TEST_DATA: ${{ github.workspace }}/graphar-testing/
     strategy:
       fail-fast: false
       matrix:
@@ -213,6 +217,7 @@ jobs:
     - name: Install dependencies
       run: |
         brew bundle --file=cpp/Brewfile
+        git clone https://github.com/apache/incubator-graphar-testing.git 
$GAR_TEST_DATA --depth 1
 
     - name: Build GraphAr
       working-directory: "cpp"
diff --git a/.github/workflows/spark.yaml b/.github/workflows/spark.yaml
index 4b8a842..3d7dc96 100644
--- a/.github/workflows/spark.yaml
+++ b/.github/workflows/spark.yaml
@@ -40,6 +40,8 @@ concurrency:
 jobs:
   test:
     runs-on: ubuntu-20.04
+    env:
+      GAR_TEST_DATA: ${{ github.workspace }}/graphar-testing/
     strategy:
       fail-fast: false
       matrix:
@@ -62,6 +64,10 @@ jobs:
         export JAVA_HOME=${JAVA_HOME_11_X64}
         mvn --no-transfer-progress spotless:check
 
+    - name: Download test data
+      run: |
+        git clone https://github.com/apache/incubator-graphar-testing.git 
$GAR_TEST_DATA --depth 1
+
     - name: Build GraphAr Spark
       working-directory: maven-projects/spark
       run: |
diff --git a/cpp/README.md b/cpp/README.md
index 38a9eab..a289102 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -88,21 +88,27 @@ Debug build with unit tests:
     $ cd build-debug
     $ cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=ON ..
     $ make -j8       # if you have 8 CPU cores, otherwise adjust, use 
-j`nproc` for all cores
-    $ make test      # to run the tests
+```
+
+After building, you can run the unit tests with:
+
+```bash
+    $ git clone https://github.com/apache/incubator-graphar-testing.git 
testing  # download the testing data
+    $ GAR_TEST_DATA=${PWD}/testing ctest
 ```
 
 Build with examples, you should build the project with `BUILD_EXAMPLES` 
option, then run:
 
 ```bash
     $ make -j8       # if you have 8 CPU cores, otherwise adjust, use 
-j`nproc` for all cores
-    $ ./bgl_example  # run the BGL example
+    $ GAR_TEST_DATA=${PWD}/testing ./bgl_example  # run the BGL example
 ```
 
 Build with benchmarks, you should build the project with `BUILD_BENCHMARKS` 
option, then run:
 
 ```bash
     $ make -j8       # if you have 8 CPU cores, otherwise adjust, use 
-j`nproc` for all cores
-    $ ./graph_info_benchmark  # run the graph info benchmark
+    $ GAR_TEST_DATA=${PWD}/testing ./graph_info_benchmark  # run the graph 
info benchmark
 ```
 
 ### Install
diff --git a/cpp/benchmarks/benchmark_util.h b/cpp/benchmarks/benchmark_util.h
index 8a6d9f8..b68cd0f 100644
--- a/cpp/benchmarks/benchmark_util.h
+++ b/cpp/benchmarks/benchmark_util.h
@@ -30,19 +30,16 @@
 
 namespace graphar {
 
-static const std::string TEST_DATA_DIR =  // NOLINT
-    std::filesystem::path(__FILE__)
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .string() +
-    "/testing";
-
 class BenchmarkFixture : public ::benchmark::Fixture {
  public:
   void SetUp(const ::benchmark::State& state) override {
-    path_ = TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+    const char* c_root = std::getenv("GAR_TEST_DATA");
+    if (!c_root) {
+      throw std::runtime_error(
+          "Test resources not found, set GAR_TEST_DATA to auxiliary testing "
+          "data");
+    }
+    path_ = std::string(c_root) + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
     auto maybe_graph_info = GraphInfo::Load(path_);
     graph_info_ = maybe_graph_info.value();
   }
diff --git a/cpp/examples/bfs_father_example.cc 
b/cpp/examples/bfs_father_example.cc
index 10d5c92..730bf76 100644
--- a/cpp/examples/bfs_father_example.cc
+++ b/cpp/examples/bfs_father_example.cc
@@ -31,7 +31,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // get the person vertices of graph
diff --git a/cpp/examples/bfs_pull_example.cc b/cpp/examples/bfs_pull_example.cc
index f247d6c..1078aee 100644
--- a/cpp/examples/bfs_pull_example.cc
+++ b/cpp/examples/bfs_pull_example.cc
@@ -30,7 +30,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/examples/bfs_push_example.cc b/cpp/examples/bfs_push_example.cc
index cdfb861..7006514 100644
--- a/cpp/examples/bfs_push_example.cc
+++ b/cpp/examples/bfs_push_example.cc
@@ -30,7 +30,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/examples/bfs_stream_example.cc 
b/cpp/examples/bfs_stream_example.cc
index 9abe402..b77fb30 100644
--- a/cpp/examples/bfs_stream_example.cc
+++ b/cpp/examples/bfs_stream_example.cc
@@ -30,7 +30,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/examples/bgl_example.cc b/cpp/examples/bgl_example.cc
index dd2e6af..c1d60bd 100644
--- a/cpp/examples/bgl_example.cc
+++ b/cpp/examples/bgl_example.cc
@@ -35,7 +35,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
   ASSERT(graph_info->GetVertexInfos().size() == 1);
   ASSERT(graph_info->GetEdgeInfos().size() == 1);
diff --git a/cpp/examples/cc_push_example.cc b/cpp/examples/cc_push_example.cc
index 1a8bdac..aee5668 100644
--- a/cpp/examples/cc_push_example.cc
+++ b/cpp/examples/cc_push_example.cc
@@ -31,7 +31,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/examples/cc_stream_example.cc 
b/cpp/examples/cc_stream_example.cc
index 4e8f0cd..23b9722 100644
--- a/cpp/examples/cc_stream_example.cc
+++ b/cpp/examples/cc_stream_example.cc
@@ -31,7 +31,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/examples/config.h b/cpp/examples/config.h
index c9984e1..ef779c0 100644
--- a/cpp/examples/config.h
+++ b/cpp/examples/config.h
@@ -56,11 +56,12 @@
 #define DASSERT(x)
 #endif
 
-static const std::string TEST_DATA_DIR =  // NOLINT
-    std::filesystem::path(__FILE__)
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .string() +
-    "/testing";
+std::string GetTestingResourceRoot() {
+  const char* c_root = std::getenv("GAR_TEST_DATA");
+  if (!c_root) {
+    throw std::runtime_error(
+        "Test resources not found, set GAR_TEST_DATA to auxiliary testing "
+        "data");
+  }
+  return std::string(c_root);
+}
diff --git a/cpp/examples/high_level_reader_example.cc 
b/cpp/examples/high_level_reader_example.cc
index 3725618..a25e229 100644
--- a/cpp/examples/high_level_reader_example.cc
+++ b/cpp/examples/high_level_reader_example.cc
@@ -120,7 +120,7 @@ void edges_collection(const 
std::shared_ptr<graphar::GraphInfo>& graph_info) {
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // vertices collection
diff --git a/cpp/examples/high_level_writer_example.cc 
b/cpp/examples/high_level_writer_example.cc
index 66aa83e..a6f25af 100644
--- a/cpp/examples/high_level_writer_example.cc
+++ b/cpp/examples/high_level_writer_example.cc
@@ -30,7 +30,7 @@
 void vertices_builder() {
   // construct vertices builder
   std::string vertex_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person.vertex.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/" + "person.vertex.yml";
   auto vertex_meta = graphar::Yaml::LoadFile(vertex_meta_file).value();
   auto vertex_info = graphar::VertexInfo::Load(vertex_meta).value();
   graphar::IdType start_index = 0;
@@ -71,8 +71,9 @@ void vertices_builder() {
 
 void edges_builder() {
   // construct edges builder
-  std::string edge_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person_knows_person.edge.yml";
+  std::string edge_meta_file = GetTestingResourceRoot() +
+                               "/ldbc_sample/parquet/" +
+                               "person_knows_person.edge.yml";
   auto edge_meta = graphar::Yaml::LoadFile(edge_meta_file).value();
   auto edge_info = graphar::EdgeInfo::Load(edge_meta).value();
   auto vertex_count = 3;
diff --git a/cpp/examples/low_level_reader_example.cc 
b/cpp/examples/low_level_reader_example.cc
index 5e68a26..c195639 100644
--- a/cpp/examples/low_level_reader_example.cc
+++ b/cpp/examples/low_level_reader_example.cc
@@ -128,7 +128,7 @@ void adj_list_property_chunk_info_reader(
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // vertex property chunk info reader
diff --git a/cpp/examples/mid_level_reader_example.cc 
b/cpp/examples/mid_level_reader_example.cc
index f273fb4..98456a3 100644
--- a/cpp/examples/mid_level_reader_example.cc
+++ b/cpp/examples/mid_level_reader_example.cc
@@ -215,7 +215,7 @@ void adj_list_offset_chunk_reader(
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // vertex property chunk reader
diff --git a/cpp/examples/mid_level_writer_example.cc 
b/cpp/examples/mid_level_writer_example.cc
index 4f678e6..7fe81a3 100644
--- a/cpp/examples/mid_level_writer_example.cc
+++ b/cpp/examples/mid_level_writer_example.cc
@@ -105,7 +105,7 @@ void vertex_property_writer(
     const std::shared_ptr<graphar::GraphInfo>& graph_info) {
   // create writer
   std::string vertex_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person.vertex.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/" + "person.vertex.yml";
   auto vertex_meta = graphar::Yaml::LoadFile(vertex_meta_file).value();
   auto vertex_info = graphar::VertexInfo::Load(vertex_meta).value();
   ASSERT(vertex_info->GetLabel() == "person");
@@ -141,8 +141,8 @@ void vertex_property_writer(
 
 void edge_chunk_writer(const std::shared_ptr<graphar::GraphInfo>& graph_info) {
   // construct writer
-  std::string edge_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/csv/" + "person_knows_person.edge.yml";
+  std::string edge_meta_file = GetTestingResourceRoot() + "/ldbc_sample/csv/" +
+                               "person_knows_person.edge.yml";
   auto edge_meta = graphar::Yaml::LoadFile(edge_meta_file).value();
   auto edge_info = graphar::EdgeInfo::Load(edge_meta).value();
   auto adj_list_type = graphar::AdjListType::ordered_by_source;
@@ -194,7 +194,7 @@ void edge_chunk_writer(const 
std::shared_ptr<graphar::GraphInfo>& graph_info) {
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // vertex property writer
diff --git a/cpp/examples/pagerank_example.cc b/cpp/examples/pagerank_example.cc
index fc3c729..1fceb1f 100644
--- a/cpp/examples/pagerank_example.cc
+++ b/cpp/examples/pagerank_example.cc
@@ -31,7 +31,7 @@
 int main(int argc, char* argv[]) {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      GetTestingResourceRoot() + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto graph_info = graphar::GraphInfo::Load(path).value();
 
   // construct vertices collection
diff --git a/cpp/test/test_arrow_chunk_reader.cc 
b/cpp/test/test_arrow_chunk_reader.cc
index 513216d..10e718b 100644
--- a/cpp/test/test_arrow_chunk_reader.cc
+++ b/cpp/test/test_arrow_chunk_reader.cc
@@ -32,10 +32,10 @@
 #include <catch2/catch_test_macros.hpp>
 namespace graphar {
 
-TEST_CASE("ArrowChunkReader") {
+TEST_CASE_METHOD(GlobalFixture, "ArrowChunkReader") {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      test_data_dir + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   std::string src_label = "person", edge_label = "knows", dst_label = "person";
   std::string vertex_property_name = "id";
   std::string edge_property_name = "creationDate";
@@ -95,7 +95,7 @@ TEST_CASE("ArrowChunkReader") {
     }
 
     SECTION("CastDataType") {
-      std::string prefix = TEST_DATA_DIR + "/modern_graph/";
+      std::string prefix = test_data_dir + "/modern_graph/";
       std::string vertex_info_path = prefix + "person.vertex.yml";
       std::cout << "Vertex info path: " << vertex_info_path << std::endl;
       auto fs = FileSystemFromUriOrPath(prefix).value();
diff --git a/cpp/test/test_arrow_chunk_writer.cc 
b/cpp/test/test_arrow_chunk_writer.cc
index 3978ba9..57aa2ea 100644
--- a/cpp/test/test_arrow_chunk_writer.cc
+++ b/cpp/test/test_arrow_chunk_writer.cc
@@ -43,8 +43,8 @@
 
 namespace graphar {
 
-TEST_CASE("TestVertexPropertyWriter") {
-  std::string path = TEST_DATA_DIR + "/ldbc_sample/person_0_0.csv";
+TEST_CASE_METHOD(GlobalFixture, "TestVertexPropertyWriter") {
+  std::string path = test_data_dir + "/ldbc_sample/person_0_0.csv";
   arrow::io::IOContext io_context = arrow::io::default_io_context();
 
   auto fs = arrow::fs::FileSystemFromUriOrPath(path).ValueOrDie();
@@ -70,7 +70,7 @@ TEST_CASE("TestVertexPropertyWriter") {
 
   // Construct the writer
   std::string vertex_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person.vertex.yml";
+      test_data_dir + "/ldbc_sample/parquet/" + "person.vertex.yml";
   auto vertex_meta = Yaml::LoadFile(vertex_meta_file).value();
   auto vertex_info = VertexInfo::Load(vertex_meta).value();
   auto maybe_writer = VertexPropertyWriter::Make(vertex_info, "/tmp/");
@@ -119,9 +119,9 @@ TEST_CASE("TestVertexPropertyWriter") {
   SECTION("TestOrcParquetReader") {
     arrow::Status st;
     arrow::MemoryPool* pool = arrow::default_memory_pool();
-    std::string path1 = TEST_DATA_DIR + "/ldbc_sample/orc" +
+    std::string path1 = test_data_dir + "/ldbc_sample/orc" +
                         "/vertex/person/firstName_lastName_gender/chunk1";
-    std::string path2 = TEST_DATA_DIR + "/ldbc_sample/parquet" +
+    std::string path2 = test_data_dir + "/ldbc_sample/parquet" +
                         "/vertex/person/firstName_lastName_gender/chunk1";
     arrow::io::IOContext io_context = arrow::io::default_io_context();
 
@@ -158,7 +158,7 @@ TEST_CASE("TestVertexPropertyWriter") {
   SECTION("TestEdgeChunkWriter") {
     arrow::Status st;
     arrow::MemoryPool* pool = arrow::default_memory_pool();
-    std::string path = TEST_DATA_DIR +
+    std::string path = test_data_dir +
                        "/ldbc_sample/parquet/edge/person_knows_person/"
                        "unordered_by_source/adj_list/part0/chunk0";
     auto fs = arrow::fs::FileSystemFromUriOrPath(path).ValueOrDie();
@@ -181,7 +181,7 @@ TEST_CASE("TestVertexPropertyWriter") {
 
     // Construct the writer
     std::string edge_meta_file =
-        TEST_DATA_DIR + "/ldbc_sample/csv/" + "person_knows_person.edge.yml";
+        test_data_dir + "/ldbc_sample/csv/" + "person_knows_person.edge.yml";
     auto edge_meta = Yaml::LoadFile(edge_meta_file).value();
     auto edge_info = EdgeInfo::Load(edge_meta).value();
     auto adj_list_type = AdjListType::ordered_by_source;
diff --git a/cpp/test/test_builder.cc b/cpp/test/test_builder.cc
index fed3b83..39e40f9 100644
--- a/cpp/test/test_builder.cc
+++ b/cpp/test/test_builder.cc
@@ -41,12 +41,12 @@
 
 #include <catch2/catch_test_macros.hpp>
 namespace graphar {
-TEST_CASE("test_vertices_builder") {
+TEST_CASE_METHOD(GlobalFixture, "test_vertices_builder") {
   std::cout << "Test vertex builder" << std::endl;
 
   // construct vertex builder
   std::string vertex_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person.vertex.yml";
+      test_data_dir + "/ldbc_sample/parquet/" + "person.vertex.yml";
   auto vertex_meta = Yaml::LoadFile(vertex_meta_file).value();
   auto vertex_info = VertexInfo::Load(vertex_meta).value();
   IdType start_index = 0;
@@ -78,7 +78,7 @@ TEST_CASE("test_vertices_builder") {
   REQUIRE(builder->GetNum() == 0);
 
   // add vertices
-  std::ifstream fp(TEST_DATA_DIR + "/ldbc_sample/person_0_0.csv");
+  std::ifstream fp(test_data_dir + "/ldbc_sample/person_0_0.csv");
   std::string line;
   getline(fp, line);
   int m = 4;
@@ -120,7 +120,7 @@ TEST_CASE("test_vertices_builder") {
   REQUIRE(builder->AddVertex(v).IsInvalid());
 
   // check the number of vertices dumped
-  auto fs = arrow::fs::FileSystemFromUriOrPath(TEST_DATA_DIR).ValueOrDie();
+  auto fs = arrow::fs::FileSystemFromUriOrPath(test_data_dir).ValueOrDie();
   auto input =
       fs->OpenInputStream("/tmp/vertex/person/vertex_count").ValueOrDie();
   auto num = input->Read(sizeof(IdType)).ValueOrDie();
@@ -128,11 +128,11 @@ TEST_CASE("test_vertices_builder") {
   REQUIRE((*ptr) == start_index + builder->GetNum());
 }
 
-TEST_CASE("test_edges_builder") {
+TEST_CASE_METHOD(GlobalFixture, "test_edges_builder") {
   std::cout << "Test edge builder" << std::endl;
   // construct edge builder
   std::string edge_meta_file =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/" + "person_knows_person.edge.yml";
+      test_data_dir + "/ldbc_sample/parquet/" + "person_knows_person.edge.yml";
   auto edge_meta = Yaml::LoadFile(edge_meta_file).value();
   auto edge_info = EdgeInfo::Load(edge_meta).value();
   auto vertices_num = 903;
@@ -160,7 +160,7 @@ TEST_CASE("test_edges_builder") {
   REQUIRE(builder->GetNum() == 0);
 
   // add edges
-  std::ifstream fp(TEST_DATA_DIR + "/ldbc_sample/person_knows_person_0_0.csv");
+  std::ifstream fp(test_data_dir + "/ldbc_sample/person_knows_person_0_0.csv");
   std::string line;
   getline(fp, line);
   std::vector<std::string> names;
@@ -201,7 +201,7 @@ TEST_CASE("test_edges_builder") {
   REQUIRE(builder->AddEdge(e).IsInvalid());
 
   // check the number of vertices dumped
-  auto fs = arrow::fs::FileSystemFromUriOrPath(TEST_DATA_DIR).ValueOrDie();
+  auto fs = arrow::fs::FileSystemFromUriOrPath(test_data_dir).ValueOrDie();
   auto input =
       fs->OpenInputStream(
             "/tmp/edge/person_knows_person/ordered_by_dest/vertex_count")
diff --git a/cpp/test/test_chunk_info_reader.cc 
b/cpp/test/test_chunk_info_reader.cc
index 90f1371..98bd09b 100644
--- a/cpp/test/test_chunk_info_reader.cc
+++ b/cpp/test/test_chunk_info_reader.cc
@@ -28,10 +28,10 @@
 
 namespace graphar {
 
-TEST_CASE("ChunkInfoReader") {
+TEST_CASE_METHOD(GlobalFixture, "ChunkInfoReader") {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      test_data_dir + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   std::string src_label = "person", edge_label = "knows", dst_label = "person";
   std::string vertex_property_name = "id";
   std::string edge_property_name = "creationDate";
@@ -61,24 +61,24 @@ TEST_CASE("ChunkInfoReader") {
       REQUIRE(maybe_chunk_path.status().ok());
       std::string chunk_path = maybe_chunk_path.value();
       REQUIRE(chunk_path ==
-              TEST_DATA_DIR + "/ldbc_sample/parquet/vertex/person/id/chunk0");
+              test_data_dir + "/ldbc_sample/parquet/vertex/person/id/chunk0");
       REQUIRE(reader->seek(520).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
       REQUIRE(chunk_path ==
-              TEST_DATA_DIR + "/ldbc_sample/parquet/vertex/person/id/chunk5");
+              test_data_dir + "/ldbc_sample/parquet/vertex/person/id/chunk5");
       REQUIRE(reader->next_chunk().ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
       REQUIRE(chunk_path ==
-              TEST_DATA_DIR + "/ldbc_sample/parquet/vertex/person/id/chunk6");
+              test_data_dir + "/ldbc_sample/parquet/vertex/person/id/chunk6");
       REQUIRE(reader->seek(900).ok());
       maybe_chunk_path = reader->GetChunk();
       chunk_path = maybe_chunk_path.value();
       REQUIRE(chunk_path ==
-              TEST_DATA_DIR + "/ldbc_sample/parquet/vertex/person/id/chunk9");
+              test_data_dir + "/ldbc_sample/parquet/vertex/person/id/chunk9");
       // now is end of the chunks
       REQUIRE(reader->next_chunk().IsIndexError());
       // test seek the id not in the chunks
@@ -114,21 +114,21 @@ TEST_CASE("ChunkInfoReader") {
       auto maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       auto chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/adj_list/part0/chunk0");
       REQUIRE(reader->seek(100).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/adj_list/part0/chunk0");
       REQUIRE(reader->next_chunk().ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/adj_list/part1/chunk0");
 
@@ -137,14 +137,14 @@ TEST_CASE("ChunkInfoReader") {
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/adj_list/part1/chunk0");
       REQUIRE(reader->seek_src(900).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/adj_list/part9/chunk0");
       REQUIRE(reader->next_chunk().IsIndexError());
@@ -164,7 +164,7 @@ TEST_CASE("ChunkInfoReader") {
       auto maybe_chunk_path = dst_reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       auto chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_dest/adj_list/part1/chunk0");
       // seek an invalid dst id
@@ -185,27 +185,27 @@ TEST_CASE("ChunkInfoReader") {
       auto maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       std::string chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/offset/chunk0");
       REQUIRE(reader->seek(520).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/offset/chunk5");
       REQUIRE(reader->next_chunk().ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/offset/chunk6");
       REQUIRE(reader->seek(900).ok());
       maybe_chunk_path = reader->GetChunk();
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/offset/chunk9");
       // now is end of the chunks
@@ -234,21 +234,21 @@ TEST_CASE("ChunkInfoReader") {
       auto maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       auto chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/creationDate/part0/chunk0");
       REQUIRE(reader->seek(100).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/creationDate/part0/chunk0");
       REQUIRE(reader->next_chunk().ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/creationDate/part1/chunk0");
 
@@ -257,14 +257,14 @@ TEST_CASE("ChunkInfoReader") {
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/creationDate/part1/chunk0");
       REQUIRE(reader->seek_src(900).ok());
       maybe_chunk_path = reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_source/creationDate/part9/chunk0");
       REQUIRE(reader->next_chunk().IsIndexError());
@@ -285,7 +285,7 @@ TEST_CASE("ChunkInfoReader") {
       auto maybe_chunk_path = dst_reader->GetChunk();
       REQUIRE(maybe_chunk_path.status().ok());
       auto chunk_path = maybe_chunk_path.value();
-      REQUIRE(chunk_path == TEST_DATA_DIR +
+      REQUIRE(chunk_path == test_data_dir +
                                 
"/ldbc_sample/parquet/edge/person_knows_person/"
                                 "ordered_by_dest/creationDate/part1/chunk0");
 
diff --git a/cpp/test/test_graph.cc b/cpp/test/test_graph.cc
index b939539..03a7470 100644
--- a/cpp/test/test_graph.cc
+++ b/cpp/test/test_graph.cc
@@ -26,10 +26,10 @@
 #include <catch2/catch_test_macros.hpp>
 
 namespace graphar {
-TEST_CASE("Graph") {
+TEST_CASE_METHOD(GlobalFixture, "Graph") {
   // read file and construct graph info
   std::string path =
-      TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
+      test_data_dir + "/ldbc_sample/parquet/ldbc_sample.graph.yml";
   auto maybe_graph_info = GraphInfo::Load(path);
   REQUIRE(maybe_graph_info.status().ok());
   auto graph_info = maybe_graph_info.value();
@@ -84,7 +84,7 @@ TEST_CASE("Graph") {
   SECTION("ListProperty") {
     // read file and construct graph info
     std::string path =
-        TEST_DATA_DIR +
+        test_data_dir +
         "/ldbc_sample/parquet/ldbc_sample_with_feature.graph.yml";
     auto maybe_graph_info = GraphInfo::Load(path);
     REQUIRE(maybe_graph_info.status().ok());
@@ -198,7 +198,7 @@ TEST_CASE("Graph") {
 
   SECTION("ValidateProperty") {
     // read file and construct graph info
-    std::string path = TEST_DATA_DIR + "/neo4j/MovieGraph.graph.yml";
+    std::string path = test_data_dir + "/neo4j/MovieGraph.graph.yml";
     auto maybe_graph_info = GraphInfo::Load(path);
     REQUIRE(maybe_graph_info.status().ok());
     auto graph_info = maybe_graph_info.value();
diff --git a/cpp/test/test_info.cc b/cpp/test/test_info.cc
index 9d4ccad..8a2a976 100644
--- a/cpp/test/test_info.cc
+++ b/cpp/test/test_info.cc
@@ -35,7 +35,7 @@
 
 namespace graphar {
 
-TEST_CASE("InfoVersion") {
+TEST_CASE_METHOD(GlobalFixture, "InfoVersion") {
   InfoVersion info_version(1);
   REQUIRE(info_version.version() == 1);
   REQUIRE(info_version.user_define_types() == std::vector<std::string>({}));
@@ -66,7 +66,7 @@ TEST_CASE("InfoVersion") {
   }
 }
 
-TEST_CASE("Property") {
+TEST_CASE_METHOD(GlobalFixture, "Property") {
   Property p0("p0", int32(), true);
   Property p1("p1", int32(), false);
 
@@ -78,7 +78,7 @@ TEST_CASE("Property") {
   REQUIRE(p1.is_nullable == true);
 }
 
-TEST_CASE("PropertyGroup") {
+TEST_CASE_METHOD(GlobalFixture, "PropertyGroup") {
   Property p0("p0", int32(), true);
   Property p1("p1", int32(), false);
   Property p2("p2", string(), false);
@@ -146,7 +146,7 @@ TEST_CASE("PropertyGroup") {
   }
 }
 
-TEST_CASE("AdjacentList") {
+TEST_CASE_METHOD(GlobalFixture, "AdjacentList") {
   AdjacentList adj_list0(AdjListType::unordered_by_source, FileType::CSV,
                          "adj_list0/");
   AdjacentList adj_list1(AdjListType::ordered_by_source, FileType::PARQUET);
@@ -180,7 +180,7 @@ TEST_CASE("AdjacentList") {
   }
 }
 
-TEST_CASE("VertexInfo") {
+TEST_CASE_METHOD(GlobalFixture, "VertexInfo") {
   std::string label = "test_vertex";
   int chunk_size = 100;
   auto version = std::make_shared<InfoVersion>(1);
@@ -303,7 +303,7 @@ version: gar/v1
   }
 }
 
-TEST_CASE("EdgeInfo") {
+TEST_CASE_METHOD(GlobalFixture, "EdgeInfo") {
   std::string src_label = "person", edge_label = "knows", dst_label = "person";
   int chunk_size = 1024;
   int src_chunk_size = 100;
@@ -519,7 +519,7 @@ version: gar/v1
   }
 }
 
-TEST_CASE("GraphInfo") {
+TEST_CASE_METHOD(GlobalFixture, "GraphInfo") {
   std::string name = "test_graph";
   auto version = std::make_shared<InfoVersion>(1);
   auto pg = CreatePropertyGroup(
@@ -683,7 +683,7 @@ vertices:
   }
 }
 
-TEST_CASE("LoadFromYaml") {
+TEST_CASE_METHOD(GlobalFixture, "LoadFromYaml") {
   std::string vertex_info_yaml = R"(label: person
 chunk_size: 100
 prefix: vertex/person/
@@ -779,7 +779,7 @@ extra_info:
   }
 }
 
-TEST_CASE("LoadFromS3", "[.hidden]") {
+TEST_CASE_METHOD(GlobalFixture, "LoadFromS3", "[.hidden]") {
   std::string path =
       "s3://graphar/ldbc/ldbc.graph.yml"
       "?endpoint_override=graphscope.oss-cn-beijing.aliyuncs.com";
diff --git a/cpp/test/util.h b/cpp/test/util.h
index 8813b82..32630de 100644
--- a/cpp/test/util.h
+++ b/cpp/test/util.h
@@ -23,17 +23,29 @@
 #include <iostream>
 #include <string>
 
-#include "graphar/util/status.h"
-
 namespace graphar {
 
-static const std::string TEST_DATA_DIR =  // NOLINT
-    std::filesystem::path(__FILE__)
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .string() +
-    "/testing";
+// Define the fixture
+struct GlobalFixture {
+  GlobalFixture() {
+    // Setup code here, this runs before each test case
+    setup();
+  }
+
+  ~GlobalFixture() {}
+
+  void setup() {
+    const char* c_root = std::getenv("GAR_TEST_DATA");
+    if (!c_root) {
+      throw std::runtime_error(
+          "Test resources not found, set GAR_TEST_DATA to auxiliary testing "
+          "data");
+    }
+    test_data_dir = std::string(c_root);
+  }
+
+  // test data dir to be used in tests
+  std::string test_data_dir;
+};
 
 }  // namespace graphar
diff --git a/maven-projects/spark/README.md b/maven-projects/spark/README.md
index 485c995..a0967ca 100644
--- a/maven-projects/spark/README.md
+++ b/maven-projects/spark/README.md
@@ -38,21 +38,27 @@ After compilation, the package file 
graphar-x.x.x-SNAPSHOT-shaded.jar is generat
 
 Build the package and run the unit tests:
 
+first, you need to download the testing data:
+
+```bash
+    $ git clone https://github.com/apache/incubator-graphar-testing.git testing
+```
+
 ```bash
-    $ mvn clean install
+    $ GRA_TEST_DATA=./testing mvn clean install
 ```
 
 Build and run the unit tests:
 
 ```bash
-    $ mvn clean test
+    $ GRA_TEST_DATA=./testing mvn clean test
 ```
 
 Build and run certain unit test:
 
 ```bash
-    $ mvn clean test -Dsuites='org.apache.graphar.GraphInfoSuite'   # run the 
GraphInfo test suite
-    $ mvn clean test -Dsuites='org.apache.graphar.GraphInfoSuite load graph 
info'  # run the `load graph info` test of test suite
+    $ GRA_TEST_DATA=${PWD}/testing mvn clean test 
-Dsuites='org.apache.graphar.GraphInfoSuite'   # run the GraphInfo test suite
+    $ GRA_TEST_DATA=${PWD}/testing mvn clean test 
-Dsuites='org.apache.graphar.GraphInfoSuite load graph info'  # run the `load 
graph info` test of test suite
 ```
 
 ### Generate API document
@@ -242,7 +248,11 @@ scripts/build.sh
 
 Then run the example:
 
+
 ```bash
+# you first need to specify the `GAR_TEST_DATA` environment variable to the 
testing data directory:
+export GAR_TEST_DATA=xxxx # the path to the testing data directory
+
 scripts/run-ldbc-sample2graphar.sh
 ```
 
diff --git a/maven-projects/spark/graphar/src/test/resources/gar-test 
b/maven-projects/spark/graphar/src/test/resources/gar-test
deleted file mode 120000
index 3bce4fa..0000000
--- a/maven-projects/spark/graphar/src/test/resources/gar-test
+++ /dev/null
@@ -1 +0,0 @@
-../../../../../../testing
\ No newline at end of file
diff --git a/cpp/test/util.h 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/BaseTestSuite.scala
similarity index 51%
copy from cpp/test/util.h
copy to 
maven-projects/spark/graphar/src/test/scala/org/apache/graphar/BaseTestSuite.scala
index 8813b82..2545d7e 100644
--- a/cpp/test/util.h
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/BaseTestSuite.scala
@@ -17,23 +17,33 @@
  * under the License.
  */
 
-#pragma once
+package org.apache.graphar
 
-#include <filesystem>
-#include <iostream>
-#include <string>
+import org.apache.spark.sql.SparkSession
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.funsuite.AnyFunSuite
 
-#include "graphar/util/status.h"
+abstract class BaseTestSuite extends AnyFunSuite with BeforeAndAfterAll {
 
-namespace graphar {
+  var testData: String = _
+  var spark: SparkSession = _
 
-static const std::string TEST_DATA_DIR =  // NOLINT
-    std::filesystem::path(__FILE__)
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .parent_path()
-        .string() +
-    "/testing";
+  override def beforeAll(): Unit = {
+    if (System.getenv("GAR_TEST_DATA") == null) {
+      throw new IllegalArgumentException("GAR_TEST_DATA is not set")
+    }
+    testData = System.getenv("GAR_TEST_DATA")
+    spark = SparkSession
+      .builder()
+      .enableHiveSupport()
+      .master("local[*]")
+      .getOrCreate()
+    spark.sparkContext.setLogLevel("Error")
+    super.beforeAll()
+  }
 
-}  // namespace graphar
+  override def afterAll(): Unit = {
+    // spark.stop()
+    super.afterAll()
+  }
+}
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
index 28dbb1e..7e84977 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
@@ -21,24 +21,14 @@ package org.apache.graphar
 
 import org.apache.graphar.reader.{VertexReader, EdgeReader}
 
-import org.apache.spark.sql.SparkSession
 import org.apache.spark.graphx._
-import org.scalatest.funsuite.AnyFunSuite
 
-class ComputeExampleSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class ComputeExampleSuite extends BaseTestSuite {
 
   test("run cc using graphx") {
     // read vertex DataFrame
-    val file_path = "gar-test/ldbc_sample/parquet/"
-    val prefix = getClass.getClassLoader.getResource(file_path).getPath
-    val vertex_yaml = getClass.getClassLoader
-      .getResource(file_path + "person.vertex.yml")
-      .getPath
+    val prefix = testData + "/ldbc_sample/parquet/"
+    val vertex_yaml = prefix + "/person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
 
     val vertex_reader = new VertexReader(prefix, vertex_info, spark)
@@ -49,9 +39,7 @@ class ComputeExampleSuite extends AnyFunSuite {
     assert(vertex_df.count() == vertices_num)
 
     // read edge DataFrame
-    val edge_yaml = getClass.getClassLoader
-      .getResource(file_path + "person_knows_person.edge.yml")
-      .getPath
+    val edge_yaml = prefix + "person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(edge_yaml, spark)
     val adj_list_type = AdjListType.ordered_by_source
 
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphInfo.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphInfo.scala
index 0547b81..41005f3 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphInfo.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphInfo.scala
@@ -19,25 +19,12 @@
 
 package org.apache.graphar
 
-import org.scalatest.funsuite.AnyFunSuite
-import org.apache.spark.sql.SparkSession
-
-class GraphInfoSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class GraphInfoSuite extends BaseTestSuite {
 
   test("load graph info") {
     // read graph yaml
-    val yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/ldbc_sample.graph.yml")
-      .getPath
-    val prefix =
-      getClass.getClassLoader
-        .getResource("gar-test/ldbc_sample/csv/")
-        .getPath
+    val prefix = testData + "/ldbc_sample/csv/"
+    val yaml_path = prefix + "ldbc_sample.graph.yml"
     val graph_info = GraphInfo.loadGraphInfo(yaml_path, spark)
 
     val vertex_info = graph_info.getVertexInfo("person")
@@ -57,9 +44,7 @@ class GraphInfoSuite extends AnyFunSuite {
   }
 
   test("load vertex info") {
-    val yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/person.vertex.yml")
-      .getPath
+    val yaml_path = testData + "/ldbc_sample/csv/person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(yaml_path, spark)
 
     assert(vertex_info.getLabel == "person")
@@ -136,9 +121,7 @@ class GraphInfoSuite extends AnyFunSuite {
   }
 
   test("load edge info") {
-    val yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/person_knows_person.edge.yml")
-      .getPath
+    val yaml_path = testData + "/ldbc_sample/csv/person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(yaml_path, spark)
 
     assert(edge_info.getSrc_label == "person")
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphReader.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphReader.scala
index 775d231..4bb8354 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphReader.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphReader.scala
@@ -21,21 +21,11 @@ package org.apache.graphar
 
 import org.apache.graphar.graph.GraphReader
 
-import org.apache.spark.sql.SparkSession
-import org.scalatest.funsuite.AnyFunSuite
-
-class TestGraphReaderSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class TestGraphReaderSuite extends BaseTestSuite {
 
   test("read graphs by yaml paths") {
     // conduct reading
-    val graph_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/parquet/ldbc_sample.graph.yml")
-      .getPath
+    val graph_path = testData + "/ldbc_sample/parquet/ldbc_sample.graph.yml"
     val vertex_edge_df_pair = GraphReader.read(graph_path, spark)
     val vertex_dataframes = vertex_edge_df_pair._1
     val edge_dataframes = vertex_edge_df_pair._2
@@ -56,9 +46,7 @@ class TestGraphReaderSuite extends AnyFunSuite {
 
   test("read graphs by graph infos") {
     // load graph info
-    val path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/parquet/ldbc_sample.graph.yml")
-      .getPath
+    val path = testData + "/ldbc_sample/parquet/ldbc_sample.graph.yml"
     val graph_info = GraphInfo.loadGraphInfo(path, spark)
 
     // conduct reading
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphTransformer.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphTransformer.scala
index ab55276..fdd10b2 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphTransformer.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphTransformer.scala
@@ -21,25 +21,14 @@ package org.apache.graphar
 
 import org.apache.graphar.graph.GraphTransformer
 
-import org.apache.spark.sql.SparkSession
 import org.apache.hadoop.fs.{Path, FileSystem}
-import org.scalatest.funsuite.AnyFunSuite
 
-class TestGraphTransformerSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class TestGraphTransformerSuite extends BaseTestSuite {
 
   test("transform graphs by yaml paths") {
     // conduct transformation
-    val source_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/parquet/ldbc_sample.graph.yml")
-      .getPath
-    val dest_path = getClass.getClassLoader
-      .getResource("gar-test/transformer/ldbc_sample.graph.yml")
-      .getPath
+    val source_path = testData + "/ldbc_sample/parquet/ldbc_sample.graph.yml"
+    val dest_path = testData + "/transformer/ldbc_sample.graph.yml"
     GraphTransformer.transform(source_path, dest_path, spark)
 
     val dest_graph_info = GraphInfo.loadGraphInfo(dest_path, spark)
@@ -73,15 +62,11 @@ class TestGraphTransformerSuite extends AnyFunSuite {
 
   test("transform graphs by graph infos") {
     // load source graph info
-    val source_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/parquet/ldbc_sample.graph.yml")
-      .getPath
+    val source_path = testData + "/ldbc_sample/parquet/ldbc_sample.graph.yml"
     val source_graph_info = GraphInfo.loadGraphInfo(source_path, spark)
 
     // load dest graph info
-    val dest_path = getClass.getClassLoader
-      .getResource("gar-test/transformer/ldbc_sample.graph.yml")
-      .getPath
+    val dest_path = testData + "/transformer/ldbc_sample.graph.yml"
     val dest_graph_info = GraphInfo.loadGraphInfo(dest_path, spark)
 
     // conduct transformation
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphWriter.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphWriter.scala
index ffa491b..488c9dd 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphWriter.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestGraphWriter.scala
@@ -21,24 +21,14 @@ package org.apache.graphar
 
 import org.apache.graphar.graph.GraphWriter
 
-import org.apache.spark.sql.SparkSession
-import org.scalatest.funsuite.AnyFunSuite
-
-class TestGraphWriterSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class TestGraphWriterSuite extends BaseTestSuite {
 
   test("write graphs with data frames") {
     // initialize a graph writer
     val writer = new GraphWriter()
 
     // put the vertex data and edge data into writer
-    val vertex_file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_0_0.csv")
-      .getPath
+    val vertex_file_path = testData + "/ldbc_sample/person_0_0.csv"
     val vertex_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -46,9 +36,7 @@ class TestGraphWriterSuite extends AnyFunSuite {
     val label = "person"
     writer.PutVertexData(label, vertex_df, "id")
 
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_knows_person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_knows_person_0_0.csv"
     val edge_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestIndexGenerator.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestIndexGenerator.scala
index 9c2c046..84ff9b2 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestIndexGenerator.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestIndexGenerator.scala
@@ -21,20 +21,10 @@ package org.apache.graphar
 
 import org.apache.graphar.util.IndexGenerator
 
-import org.apache.spark.sql.SparkSession
-import org.scalatest.funsuite.AnyFunSuite
-
-class IndexGeneratorSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class IndexGeneratorSuite extends BaseTestSuite {
 
   test("generate vertex index") {
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_0_0.csv"
     val vertex_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -48,9 +38,7 @@ class IndexGeneratorSuite extends AnyFunSuite {
   }
 
   test("generate edge index") {
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_knows_person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_knows_person_0_0.csv"
     val edge_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -64,12 +52,8 @@ class IndexGeneratorSuite extends AnyFunSuite {
   }
 
   test("generate edge index with vertex") {
-    val vertex_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_0_0.csv")
-      .getPath
-    val edge_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_knows_person_0_0.csv")
-      .getPath
+    val vertex_path = testData + "/ldbc_sample/person_0_0.csv"
+    val edge_path = testData + "/ldbc_sample/person_knows_person_0_0.csv"
     val vertex_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestReader.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestReader.scala
index 396648a..2aafb29 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestReader.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestReader.scala
@@ -21,25 +21,12 @@ package org.apache.graphar
 
 import org.apache.graphar.reader.{VertexReader, EdgeReader}
 
-import org.apache.spark.sql.SparkSession
-import org.scalatest.funsuite.AnyFunSuite
-
-class ReaderSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
-
-  spark.sparkContext.setLogLevel("Error")
+class ReaderSuite extends BaseTestSuite {
 
   test("read chunk files directly") {
     val cond = "id < 1000"
     // read vertex chunk files in Parquet
-    val parquet_file_path = "gar-test/ldbc_sample/parquet/"
-    val parquet_prefix =
-      getClass.getClassLoader.getResource(parquet_file_path).getPath
-    val parquet_read_path = parquet_prefix + "vertex/person/id"
+    val parquet_read_path = testData + "/ldbc_sample/parquet/vertex/person/id"
     val df1 = spark.read
       .option("fileFormat", "parquet")
       .format("org.apache.graphar.datasources.GarDataSource")
@@ -64,9 +51,7 @@ class ReaderSuite extends AnyFunSuite {
     df_pd.show()
 
     // read vertex chunk files in Orc
-    val orc_file_path = "gar-test/ldbc_sample/orc/"
-    val orc_prefix = getClass.getClassLoader.getResource(orc_file_path).getPath
-    val orc_read_path = orc_prefix + "vertex/person/id"
+    val orc_read_path = testData + "/ldbc_sample/orc/vertex/person/id"
     val df2 = spark.read
       .option("fileFormat", "orc")
       .format("org.apache.graphar.datasources.GarDataSource")
@@ -90,10 +75,8 @@ class ReaderSuite extends AnyFunSuite {
     df_pd.show()
 
     // read adjList chunk files recursively in CSV
-    val csv_file_path = "gar-test/ldbc_sample/csv/"
-    val csv_prefix = getClass.getClassLoader.getResource(csv_file_path).getPath
-    val csv_read_path =
-      csv_prefix + "edge/person_knows_person/ordered_by_source/adj_list"
+    val csv_read_path = testData +
+      "/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/adj_list"
     val df3 = spark.read
       .option("fileFormat", "csv")
       .option("header", "true")
@@ -115,11 +98,8 @@ class ReaderSuite extends AnyFunSuite {
 
   test("read vertex chunks") {
     // construct the vertex information
-    val file_path = "gar-test/ldbc_sample/parquet/"
-    val prefix = getClass.getClassLoader.getResource(file_path).getPath
-    val vertex_yaml = getClass.getClassLoader
-      .getResource(file_path + "person.vertex.yml")
-      .getPath
+    val prefix = testData + "/ldbc_sample/parquet/"
+    val vertex_yaml = prefix + "person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
 
     // construct the vertex reader
@@ -213,11 +193,8 @@ class ReaderSuite extends AnyFunSuite {
 
   test("read edge chunks") {
     // construct the edge information
-    val file_path = "gar-test/ldbc_sample/csv/"
-    val prefix = getClass.getClassLoader.getResource(file_path).getPath
-    val edge_yaml = getClass.getClassLoader
-      .getResource(file_path + "person_knows_person.edge.yml")
-      .getPath
+    val prefix = testData + "/ldbc_sample/csv/"
+    val edge_yaml = prefix + "person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(edge_yaml, spark)
 
     // construct the edge reader
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestWriter.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestWriter.scala
index d0de0c9..b8ef91e 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestWriter.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TestWriter.scala
@@ -21,23 +21,14 @@ package org.apache.graphar
 
 import org.apache.graphar.writer.{VertexWriter, EdgeWriter}
 
-import org.apache.spark.sql.SparkSession
-import org.scalatest.funsuite.AnyFunSuite
 import org.apache.hadoop.fs.{Path, FileSystem}
 import scala.io.Source.fromFile
 
-class WriterSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class WriterSuite extends BaseTestSuite {
 
   test("test vertex writer with only vertex table") {
     // read vertex DataFrame
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_0_0.csv"
     val vertex_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -48,9 +39,7 @@ class WriterSuite extends AnyFunSuite {
     )
 
     // read vertex yaml
-    val vertex_yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/parquet/person.vertex.yml")
-      .getPath
+    val vertex_yaml_path = testData + "/ldbc_sample/parquet/person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml_path, spark)
 
     // generate vertex index column for vertex DataFrame
@@ -94,9 +83,7 @@ class WriterSuite extends AnyFunSuite {
 
   test("test edge writer with only edge table") {
     // read edge DataFrame
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_knows_person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_knows_person_0_0.csv"
     val edge_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -108,9 +95,8 @@ class WriterSuite extends AnyFunSuite {
     )
 
     // read edge yaml
-    val edge_yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/person_knows_person.edge.yml")
-      .getPath
+    val edge_yaml_path =
+      testData + "/ldbc_sample/csv/person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(edge_yaml_path, spark)
     val adj_list_type = AdjListType.ordered_by_source
 
@@ -221,9 +207,7 @@ class WriterSuite extends AnyFunSuite {
 
   test("test edge writer with vertex table and edge table") {
     // read vertex DataFrame
-    val vertex_file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_0_0.csv")
-      .getPath
+    val vertex_file_path = testData + "/ldbc_sample/person_0_0.csv"
     val vertex_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -231,9 +215,7 @@ class WriterSuite extends AnyFunSuite {
     val vertex_num = vertex_df.count()
 
     // read edge DataFrame
-    val file_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/person_knows_person_0_0.csv")
-      .getPath
+    val file_path = testData + "/ldbc_sample/person_knows_person_0_0.csv"
     val edge_df = spark.read
       .option("delimiter", "|")
       .option("header", "true")
@@ -247,15 +229,12 @@ class WriterSuite extends AnyFunSuite {
     val adj_list_type = AdjListType.ordered_by_source
 
     // read vertex yaml
-    val vertex_yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/person.vertex.yml")
-      .getPath
+    val vertex_yaml_path = testData + "/ldbc_sample/csv/person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml_path, spark)
 
     // read edge yaml
-    val edge_yaml_path = getClass.getClassLoader
-      .getResource("gar-test/ldbc_sample/csv/person_knows_person.edge.yml")
-      .getPath
+    val edge_yaml_path =
+      testData + "/ldbc_sample/csv/person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(edge_yaml_path, spark)
     val vertex_chunk_size = edge_info.getSrc_chunk_size()
     val vertex_chunk_num =
@@ -325,11 +304,8 @@ class WriterSuite extends AnyFunSuite {
     // compare with correct offset chunk value
     val offset_file_path =
       prefix + edge_info.getAdjListOffsetFilePath(0, adj_list_type)
-    val correct_offset_file_path = getClass.getClassLoader
-      .getResource(
-        
"gar-test/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/offset/chunk0"
-      )
-      .getPath
+    val correct_offset_file_path = testData +
+      
"/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/offset/chunk0"
     val generated_offset_array = fromFile(offset_file_path).getLines.toArray
     val expected_offset_array =
       fromFile(correct_offset_file_path).getLines.toArray
diff --git 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
index 224b836..bf0b521 100644
--- 
a/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
+++ 
b/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
@@ -22,24 +22,14 @@ package org.apache.graphar
 import org.apache.graphar.reader.{VertexReader, EdgeReader}
 import org.apache.graphar.writer.{VertexWriter, EdgeWriter}
 
-import org.apache.spark.sql.SparkSession
 import org.apache.hadoop.fs.{Path, FileSystem}
-import org.scalatest.funsuite.AnyFunSuite
 
-class TransformExampleSuite extends AnyFunSuite {
-  val spark = SparkSession
-    .builder()
-    .enableHiveSupport()
-    .master("local[*]")
-    .getOrCreate()
+class TransformExampleSuite extends BaseTestSuite {
 
   test("transform file type") {
     // read from orc files
-    val file_path = "gar-test/ldbc_sample/orc/"
-    val prefix = getClass.getClassLoader.getResource(file_path).getPath
-    val vertex_yaml = getClass.getClassLoader
-      .getResource(file_path + "person.vertex.yml")
-      .getPath
+    val prefix = testData + "/ldbc_sample/orc/"
+    val vertex_yaml = prefix + "person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
 
     val reader = new VertexReader(prefix, vertex_info, spark)
@@ -48,11 +38,8 @@ class TransformExampleSuite extends AnyFunSuite {
     assert(vertex_df_with_index.count() == vertices_num)
 
     // write to parquet files
-    val output_file_path = "gar-test/ldbc_sample/parquet/"
     val output_prefix: String = "/tmp/example/"
-    val output_vertex_yaml = getClass.getClassLoader
-      .getResource(output_file_path + "person.vertex.yml")
-      .getPath
+    val output_vertex_yaml = testData + 
"/ldbc_sample/parquet/person.vertex.yml"
     val output_vertex_info =
       VertexInfo.loadVertexInfo(output_vertex_yaml, spark)
 
@@ -72,20 +59,15 @@ class TransformExampleSuite extends AnyFunSuite {
   }
 
   test("transform adjList type") {
-    val file_path = "gar-test/ldbc_sample/parquet/"
-    val prefix = getClass.getClassLoader.getResource(file_path).getPath
+    val prefix = testData + "/ldbc_sample/parquet/"
     // get vertex num
-    val vertex_yaml = getClass.getClassLoader
-      .getResource(file_path + "person.vertex.yml")
-      .getPath
+    val vertex_yaml = prefix + "person.vertex.yml"
     val vertex_info = VertexInfo.loadVertexInfo(vertex_yaml, spark)
     // construct the vertex reader
     val vreader = new VertexReader(prefix, vertex_info, spark)
     val vertexNum = vreader.readVerticesNumber()
     // read edges of unordered_by_source type
-    val edge_yaml = getClass.getClassLoader
-      .getResource(file_path + "person_knows_person.edge.yml")
-      .getPath
+    val edge_yaml = prefix + "person_knows_person.edge.yml"
     val edge_info = EdgeInfo.loadEdgeInfo(edge_yaml, spark)
 
     val adj_list_type = AdjListType.unordered_by_source
diff --git a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh 
b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
index 79d47ab..40c07db 100755
--- a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
+++ b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
@@ -22,8 +22,8 @@ set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 
jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-person_input_file="${cur_dir}/../../../testing/ldbc_sample/person_0_0.csv"
-person_knows_person_input_file="${cur_dir}/../../../testing/ldbc_sample/person_knows_person_0_0.csv"
+person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_0_0.csv"
+person_knows_person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_knows_person_0_0.csv"
 output_dir="/tmp/graphar/ldbc_sample"
 
 vertex_chunk_size=100


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-graphar) branch main updated: Feat(CI): Use environment variable to specify the location of testing data (#512)

Reply via email to