(incubator-graphar) branch format-definition-dev updated: feat (format): Introduce buf (#519)

ssinchenko Thu, 13 Jun 2024 03:02:31 -0700

This is an automated email from the ASF dual-hosted git repository.

ssinchenko pushed a commit to branch format-definition-dev
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git



The following commit(s) were added to refs/heads/format-definition-dev by this 
push:
     new 22aa41fc feat (format): Introduce buf (#519)
22aa41fc is described below

commit 22aa41fcc486b0d8bc702f19e89d849974f2a7fd
Author: Semyon <[email protected]>
AuthorDate: Thu Jun 13 12:02:21 2024 +0200

    feat (format): Introduce buf (#519)
    
    * feat(spark): Refactoring datasources (#514)
    
    ### Reason for this PR
    By moving datasources under `org.apache.spark.sql` we are able to access 
private Spark API. Last time when I was trying to fully migrate datasources to 
V2 it was a blocker. Detailed motivation is in #493
    
    ### What changes are included in this PR?
    Mostly refactoring.
    
    ### Are these changes tested?
    Unit tests are passed
    
    I manually checked the generated JARs:
    
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)
    
    ### Are there any user-facing changes?
    Mostly not because `GarDataSource` was left under the same package.
    
    
    Close #493
    
    * feat(dev): Add release and verify scripts (#507)
    
    Reason for this PR
    Add scripts for developer or release manager to easily release version or 
verify a version.
    
    What changes are included in this PR?
    Add release and verify scripts
    related document is updated to website, see Update the release and verify 
document, and add development document incubator-graphar-website#18
    Are these changes tested?
    yes
    
    Are there any user-facing changes?
    no
    ---------
    
    Signed-off-by: acezen <[email protected]>
    
    * chore: Bump to version v0.12.0 (Round 1) (#517)
    
    
    Signed-off-by: acezen <[email protected]>
    
    * chore: Add CHANGELOG.md (#513)
    
    
    Signed-off-by: acezen <[email protected]>
    
    * Introduce buf
    
    - v2
    - buf.gen
    - buf
    
     On branch format-definition-dev
     Your branch is up to date with 'origin/format-definition-dev'.
    
     Changes to be committed:
            new file:   buf.gen.yaml
            new file:   buf.yaml
            modified:   format/adjacent_list.proto
            modified:   format/edge_info.proto
            modified:   format/graph_info.proto
            modified:   format/property_group.proto
            modified:   format/types.proto
            modified:   format/vertex_info.proto
    
    ---------
    
    Signed-off-by: acezen <[email protected]>
    Co-authored-by: Weibin Zeng <[email protected]>
---
 .devcontainer/graphar-dev.Dockerfile               |   6 +
 CHANGELOG.md                                       | 360 +++++++++++++++++++++
 CONTRIBUTING.md                                    |  13 +-
 LICENSE                                            |  16 +-
 NOTICE                                             |   8 +
 README.md                                          |  19 +-
 buf.gen.yaml                                       |  18 ++
 buf.yaml                                           |   3 +
 cpp/CMakeLists.txt                                 |   4 +-
 cpp/README.md                                      |   4 +-
 cpp/test/test_arrow_chunk_reader.cc                |   3 +-
 .../download_test_data.sh                          |  27 +-
 .../neo4j.sh => dev/release/conda_env_cpp.txt      |  15 +-
 .../neo4j.sh => dev/release/conda_env_scala.txt    |  12 +-
 dev/release/release.py                             | 119 +++++++
 dev/release/setup-ubuntu.sh                        |  52 +++
 dev/release/verify.py                              | 174 ++++++++++
 format/adjacent_list.proto                         |   3 +-
 format/edge_info.proto                             |   3 +-
 format/graph_info.proto                            |   3 +-
 format/property_group.proto                        |   3 +-
 format/types.proto                                 |   3 +-
 format/vertex_info.proto                           |   5 +-
 licenserc.toml                                     |   2 +
 maven-projects/info/pom.xml                        |   1 +
 maven-projects/java/README.md                      |   2 +-
 maven-projects/java/pom.xml                        |   1 +
 maven-projects/pom.xml                             |   2 +-
 maven-projects/spark/README.md                     |   1 -
 .../apache/graphar/datasources/GarDataSource.scala |  16 +-
 .../sql/graphar}/GarCommitProtocol.scala           |  10 +-
 .../sql/graphar}/GarScan.scala                     |  25 +-
 .../sql/graphar}/GarScanBuilder.scala              |   8 +-
 .../sql/graphar}/GarTable.scala                    |  14 +-
 .../sql/graphar/GarWriteBuilder.scala}             |  16 +-
 .../sql/graphar/csv/CSVWriteBuilder.scala}         |   7 +-
 .../sql/graphar}/orc/OrcOutputWriter.scala         |   7 +-
 .../sql/graphar}/orc/OrcWriteBuilder.scala         |   8 +-
 .../sql/graphar/parquet/ParquetWriteBuilder.scala} |  11 +-
 .../apache/graphar/datasources/GarDataSource.scala |   3 +-
 .../sql/graphar}/GarCommitProtocol.scala           |  10 +-
 .../sql/graphar}/GarScan.scala                     |  19 +-
 .../sql/graphar}/GarScanBuilder.scala              |   7 +-
 .../sql/graphar}/GarTable.scala                    |  14 +-
 .../sql/graphar/GarWriteBuilder.scala}             |   4 +-
 .../sql/graphar/csv/CSVWriteBuilder.scala}         |   5 +-
 .../sql/graphar}/orc/OrcOutputWriter.scala         |   2 +-
 .../sql/graphar}/orc/OrcWriteBuilder.scala         |   4 +-
 .../sql/graphar/parquet/ParquetWriteBuilder.scala} |   4 +-
 maven-projects/spark/graphar/pom.xml               |   1 +
 maven-projects/spark/import/neo4j.sh               |   2 +-
 maven-projects/spark/pom.xml                       |   1 +
 maven-projects/spark/scripts/run-graphar2nebula.sh |   2 +-
 maven-projects/spark/scripts/run-graphar2neo4j.sh  |   2 +-
 .../spark/scripts/run-ldbc-sample2graphar.sh       |   2 +-
 maven-projects/spark/scripts/run-nebula2graphar.sh |   2 +-
 maven-projects/spark/scripts/run-neo4j2graphar.sh  |   2 +-
 pyspark/README.md                                  |   2 +-
 pyspark/graphar_pyspark/__init__.py                |   1 +
 59 files changed, 910 insertions(+), 183 deletions(-)

diff --git a/.devcontainer/graphar-dev.Dockerfile 
b/.devcontainer/graphar-dev.Dockerfile
index 2bdd07a2..1c910d6f 100644
--- a/.devcontainer/graphar-dev.Dockerfile
+++ b/.devcontainer/graphar-dev.Dockerfile
@@ -40,6 +40,12 @@ RUN git clone --branch v1.8.3 
https://github.com/google/benchmark.git /tmp/bench
     && make install \
     && rm -rf /tmp/benchmark
 
+RUN git clone --branch v3.6.0 https://github.com/catchorg/Catch2.git 
/tmp/catch2 --depth 1 \
+    && cd /tmp/catch2 \
+    && cmake -Bbuild -H. -DBUILD_TESTING=OFF \
+    && cmake --build build/ --target install \
+    && rm -rf /tmp/catch2
+
 ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib:/usr/local/lib64
 ENV JAVA_HOME=/usr/lib/jvm/default-java
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 00000000..bca929a6
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,360 @@
+# Changelog
+All changes to this project will be documented in this file.
+
+The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic 
Versioning](https://semver.org/spec/v2.0.0.html).
+
+
+## [v0.11.4] - 2024-03-27
+### Added
+
+- [Minor][Spark] Add SPARK_TESTING variable to increse tests performance by 
@SemyonSinchenko in https://github.com/apache/graphar/pull/405
+- Bump up GraphAr version to v0.11.4 by @acezen in 
https://github.com/apache/graphar/pull/417
+
+### Changed
+
+- [FEAT][C++] Enhance the validation of writer with arrow::Table's Validate by 
@acezen in https://github.com/apache/graphar/pull/410
+- [FEAT][C++] Change the default namespace to `graphar` by @acezen in 
https://github.com/apache/graphar/pull/413
+- [FEAT][C++] Not allow setting custom namespace for code clarity by @acezen 
in https://github.com/apache/graphar/pull/415
+
+### Docs
+
+- [Feat][Doc] Refactor and update the format specification document by @acezen 
in https://github.com/apache/graphar/pull/387
+
+## [v0.11.3] - 2024-03-12
+### Added
+
+- [Feat][Spark] Split datasources and core, prepare for support of multiple 
spark versions by @SemyonSinchenko in https://github.com/apache/graphar/pull/369
+- [Feat][Format][Spark] Add nullable key in meta-data by @Thespica in 
https://github.com/apache/graphar/pull/365
+- [Feat][Spark] Spark 3.3.x support as a Maven Profile by @SemyonSinchenko in 
https://github.com/apache/graphar/pull/376
+- [C++] Include an example of converting SNAP datasets to GraphAr format by 
@lixueclaire in https://github.com/apache/graphar/pull/386
+- [Feat][C++] Support  `Date` and `Timestamp` data type by @acezen in 
https://github.com/apache/graphar/pull/398
+- Bump up GraphAr version to v0.11.3 by @acezen in 
https://github.com/apache/graphar/pull/400
+
+### Changed
+
+- [Feat][Spark] Update PySpark bindings following GraphAr Spark by 
@SemyonSinchenko in https://github.com/apache/graphar/pull/374
+- [Minor][C++] Revise the unsupported data type error msg to give more 
information by @acezen in https://github.com/apache/graphar/pull/391
+
+### Fixed
+
+- [BugFix][C++] Fix bug: PropertyGroup with empty properties make 
VertexInfo/EdgeInfo dumps failed by @acezen in 
https://github.com/apache/graphar/pull/393
+- [BugFix][C++]: Fix `VertexInfo/EdgeInfo` can not be saved to a URI path by 
@acezen in https://github.com/apache/graphar/pull/395
+- [Improvement][C++] Fixes compilation warnings in C++ SDK by @sighingnow in 
https://github.com/apache/graphar/pull/388
+
+### Docs
+
+- [Feat][Doc] update Spark documentation by introducing Maven Profiles by 
@SemyonSinchenko in https://github.com/apache/graphar/pull/380
+- [Improvement][Doc] Provide an implementation status page to indicate 
libraries status of format implementation support by @acezen in 
https://github.com/apache/graphar/pull/373
+- [Minor][Doc] Fix the link of the images by @acezen in 
https://github.com/apache/graphar/pull/383
+- [Minor][Doc] Update and fix the implementation status page by @lixueclaire 
in https://github.com/apache/graphar/pull/385
+- [Feat][Doc] switch to poetry project for docs generating by @SemyonSinchenko 
in https://github.com/apache/graphar/pull/384
+
+## [v0.11.2] - 2024-02-24
+### Added
+
+- [Feat][Format][C++] Support nullable key for property in meta-data by 
@Thespica in https://github.com/apache/graphar/pull/355
+- [Feat][Format][C++] Support extra info in graph info by @acezen in 
https://github.com/apache/graphar/pull/356
+
+### Changed
+- [Improvement][Spark] Try to make neo4j generate DataFrame with the correct 
data type by @acezen in https://github.com/apache/graphar/pull/353
+- [Improve][C++] Revise the ArrowChunkReader constructors by remove redundant 
parameter by @acezen in https://github.com/apache/graphar/pull/360
+- [Improvement][Doc][CPP] Complement the api reference document of cpp by 
@acezen in https://github.com/apache/graphar/pull/364
+- Bump up GraphAr version to v0.11.2 by @acezen in 
https://github.com/apache/graphar/pull/371
+
+### Fixed
+
+- [Chore][C++] fix err message by @jasinliu in 
https://github.com/apache/graphar/pull/345
+- [BugFix][C++] Update the testing path with latest testing repo by @acezen in 
https://github.com/apache/graphar/pull/346
+
+### Docs
+
+- [Doc] Enhance the ReadMe with additional information about the GraphAr 
libraries by @lixueclaire in https://github.com/apache/graphar/pull/349
+- [Minor][Doc] Update publication information and fix link in ReadMe by 
@lixueclaire in https://github.com/apache/graphar/pull/350
+- [Minor][Doc] Minor fix typo of cpp reference by @acezen in 
https://github.com/apache/graphar/pull/363
+
+## [v0.11.1] - 2024-01-24
+### Changed
+
+- [Improvement][Spark] Improve the writer effeciency with parallel process by 
@acezen in https://github.com/apache/graphar/pull/329
+- [Feat][Spark] Memory tuning for GraphAr spark with persist and storage level 
by @acezen in https://github.com/apache/graphar/pull/326
+- Bump up GraphAr version to v0.11.1 by @acezen in 
https://github.com/apache/graphar/pull/342
+
+### Fixed
+
+- [Minor][Spark] Fix typo by @acezen in 
https://github.com/apache/graphar/pull/327
+- [Bug][C++] Add implement of property<bool> by @jasinliu in 
https://github.com/apache/graphar/pull/337
+- [BugFix][C++] Check  is not nullptr before calling ToString and fix empty 
prefix bug by @acezen in https://github.com/apache/graphar/pull/339
+
+### Docs
+
+- [Minor][Doc] Update getting-started.rst to fix a typo by @jasinliu in 
https://github.com/apache/graphar/pull/325
+- [Minor][Doc] Remove unused community channel and add publication citation by 
@acezen in https://github.com/apache/graphar/pull/331
+- [Minor][Doc] Fix README by @acezen in 
https://github.com/apache/graphar/pull/332
+- [Minor][Spark] minor doc fix by @acezen in 
https://github.com/apache/graphar/pull/336
+
+## [v0.11.0] - 2024-01-15
+### Added
+
+- Bump up GraphAr version to v0.11.0 @acezen
+- [Feat][Spark] Align info implementation of spark with c++ (#316) Weibin Zeng
+- [Feat][Spark] Implementation of PySpark bindings to Scala API (#300) Semyon
+- [Feat][C++] Initialize the micro benchmark for c++ (#299) Weibin Zeng
+- [Improve][Java] Get test resources form environment variables, and remove 
all print sentences (#309) John
+- [Feat][Spark] Add Neo4j importer (#243) Liu Jiajun
+- [FEAT][C++] Support `list<string>` data type (#302) Weibin Zeng
+- [Minor][Dev] Update the PR template (#301) Weibin Zeng
+- [Feat][C++] Support List Data Type, use `list<float>` as example (#296) 
Weibin Zeng
+- [FEAT][C++] Refactor the C++ SDK with forward declaration and shared ptr 
(#290) Weibin Zeng
+- [FEAT][C++] Use `shared_ptr` in all readers and writers (#281) Weibin Zeng
+- [Feat][Java] Fill two incompatible gaps between C++ and Java (#279) John 
@Thespica
+
+### Changed
+
+- [Improvement][Spark] Change VertexWriter constructor signature (#314) Semyon
+- [Feat][Spark] Update snakeyaml to 2.x.x version (#312) Semyon
+- [Minor][License] Update the license header and add license check in CI 
(#294) Weibin Zeng
+- [Minor][C++] Improve the validation check (#310) Weibin Zeng
+- [Minor][Dev] Update release workflow to make release easy and revise other 
workflows (#323) Weibin Zeng
+
+### Fixed
+
+- [Minor][Spark] Fix Spark comparison bug (#318) Zhang Lei
+- [Minor][Doc] Fix spark url in README.m (#317) Zhang Lei
+- [BugFix][Spark] Fix the comparison behavior of 
Property/PropertyGroup/AdjList (#306) Weibin Zeng
+- [BugFix][Spark] change maven-site-plugin to 3.7.1 (#305) Weibin Zeng
+- [Minor][Doc] Fix the cpp reference doc (#295) Weibin Zeng
+- [Minor][C++] Fix typo: REGULAR_SEPERATOR -> REGULAR_SEPARATOR (#293) Weibin 
Zeng
+- [BugFix][C++] Finalize S3 in FileSystem destructor (#289) Weibin Zeng
+- [Minor][Doc] Fix the typos of document (#282) Weibin Zeng
+- [BugFix][JAVA] Fix invalid option to skip building GraphAr c++ internally 
for java (#284) John
+
+### Docs
+
+- [Doc][Improvement] Reorg the document structure by libraries (#292) Weibin 
Zeng
+
+## [v0.10.0] - 2023-11-10
+### Added
+
+- [Feat][Spark] Add examples to show how to load/dump data from/to GraphAr for 
Nebula (#244) (Liu Xiao) [#244](https://github.com/apache/graphar/pull/244)
+- [Minor][Spark] Support get GraphAr Spark from Maven (#250) (Weibin Zeng) 
[#250](https://github.com/apache/graphar/pull/250)
+- [Improvement][C++] Use inherit to implement EdgesCollection (#238) (Weibin 
Zeng) [#238](https://github.com/apache/graphar/pull/238)
+- [C++] Add examples about how to use C++ reader/writer (#252) (lixueclaire) 
[#252](https://github.com/apache/graphar/pull/252)
+- [Improve][C++] Use arrow shared library if arrow installed (#263) (Weibin 
Zeng) [#263](https://github.com/apache/graphar/pull/263)
+- [Improve][Java] Make EdgesCollection and VerticesCollection support foreach 
loop (#270) (John) [#270](https://github.com/apache/graphar/pull/270)
+- [Minor][CI] Install certain version of arrow in CI to avoid breaking down CI 
when arrow upgrade (#273) (Weibin Zeng) 
[#273](https://github.com/apache/graphar/pull/273)
+- [Improvement][Spark] Complement the error messages of spark SDK (#278) 
(Weibin Zeng) [#278](https://github.com/apache/graphar/pull/278)
+- [Feat][Format] Add internal id column to vertex payload file (#264) (Weibin 
Zeng) [#264](https://github.com/apache/graphar/pull/264)
+
+### Changed
+
+- [Minor][C++] Update the C++ SDK version config (#266) (Weibin Zeng) 
[#266](https://github.com/apache/graphar/pull/266)
+- [Doc][BugFix] Fix missing of scaladoc and javadoc in website (#269) (John) 
[#269](https://github.com/apache/graphar/pull/269)
+
+### Fixed
+
+- [BUG][C++] Fix testing data path of examples (#251) (lixueclaire) 
[#251](https://github.com/apache/graphar/pull/251)
+- [BugFix][Spark] Close the FileSystem Object (haohao0103) 
[#258](https://github.com/apache/graphar/pull/258)
+- [BugFix][JAVA] Fix the building order bug of JAVA SDK (#261) (Weibin Zeng) 
[#261](https://github.com/apache/graphar/pull/261)
+
+### Docs
+
+- [Minor][Doc]Add release-process.md to explain the release process, as 
supplement of road map (#254) (Weibin Zeng) 
[#254](https://github.com/apache/graphar/pull/254)
+- [Doc][Spark] Update the doc: fix the outdated argument annotations and typo 
(#267) (Weibin Zeng) [#267](https://github.com/apache/graphar/pull/267)
+- [Doc] Provide Java's reference library, documentation for users and 
developers (#242) (John) [#242](https://github.com/apache/graphar/pull/242)
+
+
+## [v0.9.0] - 2023-10-08
+### Added
+
+- Define code style for spark and java and add code format check to CI (#232) 
(Weibin Zeng) [#232](https://github.com/apache/graphar/pull/232)
+- [FEAT][JAVA] Implement READERS and WRITERS for Java (#233) (John) 
[#233](https://github.com/apache/graphar/pull/233)
+- [Spark] Support property filter pushdown by utilizing payload file formats 
(#221) (Ziyi Tan) [#221](https://github.com/apache/graphar/pull/221)
+
+## [v0.8.0] - 2023-08-30
+### Added
+
+- [Minor][Spark] Adapt spark yaml format to BLOCK  (#217) (Weibin Zeng) 
[#217](https://github.com/apache/graphar/pull/217)
+- [Feat][C++] Output the error message when access value in Result fail (#222) 
(Weibin Zeng) [#222](https://github.com/apache/graphar/pull/222)
+- [Feat][Java] Initialize the JAVA SDK: add INFO implementation (#212) (John) 
[#212](https://github.com/apache/graphar/pull/212)
+- [Feat][C++] Support building GraphAr with system installed arrow (#230) 
(Weibin Zeng) [#230](https://github.com/apache/graphar/pull/230)
+
+### Changed
+
+- [FEAT] Unify the name:`utils` -> `util` and the namespace of `GraphAr::util` 
(#225) (Weibin Zeng) [#225](https://github.com/apache/graphar/pull/225)
+
+### Fixed
+
+- [Minor] Fix the broken CI of doc (#214) (Weibin Zeng) 
[#214](https://github.com/apache/graphar/pull/214)
+- [BugFix][Spark] Fix compile error under JDK8 and maven 3.9.x (#216) (Liu 
Xiao) [#216](https://github.com/apache/graphar/pull/216)
+- [BugFix][C++] Remove arrow header from GraphAr's header (#229) (Weibin Zeng) 
[#229](https://github.com/apache/graphar/pull/229)
+
+## [v0.7.0] - 2023-07-24
+### Added
+
+- [C++] Support property filter pushdown by utilizing payload file formats 
(#178) (Ziyi Tan) [#178](https://github.com/apache/graphar/pull/178)
+
+### Changed
+
+- [C++][Improvement] Redesign and unify the implementation of validation in 
C++ Writer/Builder (#186) (lixueclaire) 
[#186](https://github.com/apache/graphar/pull/186)
+- [Improvement][C++] Refine the error message of errors of C++ SDK (#192) 
(Weibin Zeng) [#192](https://github.com/apache/graphar/pull/192)
+- [Improvement][C++] Refine the error message of Reader SDK (#195) (Ziyi Tan) 
[#195](https://github.com/apache/graphar/pull/195)
+- Update the favicon image (#199) (Weibin Zeng) 
[#199](https://github.com/apache/graphar/pull/199)
+- Update doc comments in graph_info.h (#204) (John) 
[#204](https://github.com/apache/graphar/pull/204)
+- [Spark] Refine the `GraphWriter` to automatically generate graph info and 
improve the Neo4j case (#196) (Weibin Zeng) 
[#196](https://github.com/apache/graphar/pull/196)
+
+### Fixed
+
+- Fixes the pull_request_target usage to avoid the secret leak issue. (#193) 
(Tao He) [#193](https://github.com/apache/graphar/pull/193)
+- Fixes the link to the logo image in README (#198) (Tao He) 
[#198](https://github.com/apache/graphar/pull/198)
+- [Minor][C++] Fix grammar mistakes. (#208) (John) 
[#208](https://github.com/apache/graphar/pull/208)
+
+### Docs
+
+- [Minor][Doc] Add GraphAr logo to README (#197) (Weibin Zeng) 
[#197](https://github.com/apache/graphar/pull/197)
+- [Spark][Doc]Add java version for neo4j example. (#207) (Liu Jiajun) 
[#207](https://github.com/apache/graphar/pull/207)
+
+## [v0.6.0] - 2023-06-09
+### Added
+
+- [C++] Support to get reference of the property in Vertex/Edge (#156) 
(lixueclaire) [#156](https://github.com/apache/graphar/pull/156)
+- [C++] Align arrow version to system if arrow installed (#162) (@acezen 
Weibin Zeng) [#162](https://github.com/apache/graphar/pull/162)
+- [BugFix] [C++] Make examples to generate result files under build type of 
release (#173) (lixueclaire) [#173](https://github.com/apache/graphar/pull/173)
+- [Improvement][C++] Use recommended parameter to sort in Writer (#177) 
(@lixueclaire lixueclaire) [#177](https://github.com/apache/graphar/pull/177)
+- [C++][Improvement] Add validation of different levels for builders in C++ 
library (#181) (lixueclaire) [#181](https://github.com/apache/graphar/pull/181)
+
+### Changed
+
+### Fixed
+
+- Fix compile error on ARM platform (#158) (Weibin Zeng) 
[#158](https://github.com/apache/graphar/pull/158)
+- [C++][BugFix] Fix the arrow acero not found error when building with arrow 
12.0.0 or greater (#164) (Weibin Zeng) 
[#164](https://github.com/apache/graphar/pull/164)
+
+### Docs
+
+- [Doc] Refine the documentation of file format design (#165) (lixueclaire) 
[#165](https://github.com/apache/graphar/pull/165)
+- [Doc] Improve spelling (#175) (Ziyi Tan) 
[#175](https://github.com/apache/graphar/pull/175)
+- [MINOR][DOC] Add mail list to our communication tools and add community 
introduction (#179) (Weibin Zeng) 
[#179](https://github.com/apache/graphar/pull/179)
+- [Doc]Refine README in cpp about building (#182) (John) 
[#182](https://github.com/apache/graphar/pull/182)
+
+## [v0.5.0] - 2023-05-12
+### Added
+
+- Enable  arrow S3 support to support reading and writing file with S3/OSS 
(#125) (Weibin Zeng) [#125](https://github.com/apache/graphar/pull/125)
+- [Improvement][C++] Add validation for data types for writers in C++ library 
(#136) (lixueclaire) [#136](https://github.com/apache/graphar/pull/136)
+- [C++] Add vertex_count file for storing edges in GraphAr (#138) 
(lixueclaire) [#138](https://github.com/apache/graphar/pull/138)
+- [FEAT] Use single header yaml parser `mini-yaml` (#142) (Weibin Zeng) 
[#142](https://github.com/apache/graphar/pull/142)
+- Implement the add-assign operator for VertexIter (#151) (lixueclaire) 
[#151](https://github.com/apache/graphar/pull/151)
+
+### Changed
+
+- [Improvement][C++] Improve the usability of EdgesCollection (#133) 
(lixueclaire) [#133](https://github.com/apache/graphar/pull/133)
+- [Minor] Update README: add information about weekly meeting (#139) (Weibin 
Zeng) [#139](https://github.com/apache/graphar/pull/139)
+- [Minor] Make the curl interface private (#146) (Weibin Zeng) 
[#146](https://github.com/apache/graphar/pull/146)
+- [Doc] Update the images of README (#145) (Weibin Zeng) 
[#145](https://github.com/apache/graphar/pull/145)
+- [Spark] Update the Spark library to align with the latest file format design 
(#144) (lixueclaire) [#144](https://github.com/apache/graphar/pull/144)
+- [Minor][Doc]Remove deleted methods from API Reference (#149) (lixueclaire) 
[#149](https://github.com/apache/graphar/pull/149)
+- [Doc] Refine building steps to be more clear in ReadMe (#154) (lixueclaire) 
[#154](https://github.com/apache/graphar/pull/154)
+
+### Fixed
+
+- [BugFix][C++] Fix next_chunk() of readers in the C++ library (#137) 
(lixueclaire) [#137](https://github.com/apache/graphar/pull/137)
+- [Minor] HotFix the link error of libcurl when building test (#147) (Weibin 
Zeng) [#147](https://github.com/apache/graphar/pull/147)
+- [Minor] Fix the overview image (#148) (Weibin Zeng) 
[#148](https://github.com/apache/graphar/pull/148)
+- [Minor] Fix building arrow bug on centos8 (#150) (Weibin Zeng) 
[#150](https://github.com/apache/graphar/pull/150)
+
+## [v0.4.0] - 2023-04-13
+### Added
+
+- [Minor] Add discord invite link and banner to README (#129) (@acezen Weibin 
Zeng) [#129](https://github.com/apache/graphar/pull/129)
+- [Improvement][C++] Implement the add operator for VertexIter (#128) 
(@lixueclaire lixueclaire) [#128](https://github.com/apache/graphar/pull/128)
+- [C++] Add edge count file in GraphAr (#132) (lixueclaire) 
[#132](https://github.com/apache/graphar/pull/132)
+
+### Changed
+
+- Disable jemalloc when building the bundled arrow (#122) (@sighingnow Tao He) 
[#122](https://github.com/apache/graphar/pull/122)
+- [Minor][C++] Adjust the dependency version of arrow and fix arrow header 
conflict bug (#134) (Weibin Zeng) 
[#134](https://github.com/apache/graphar/pull/134)
+- [Minor] Update testing data (#135) (Weibin Zeng) 
[#135](https://github.com/apache/graphar/pull/135)
+
+### Fixed
+
+- [Minor][C++] Fix compile warning (#123) (Yee) 
[#123](https://github.com/apache/graphar/pull/123)
+- Fix test data path for examples (#131) (lixueclaire) 
[#131](https://github.com/apache/graphar/pull/131)
+
+## [v0.3.0] - 2023-03-10
+### Added
+
+- [Improvement][Spark] Add helper objects and methods for loading info classes 
from files (#112) (lixueclaire) 
[#112](https://github.com/apache/graphar/pull/112)
+- [Improvement][Spark] Provide APIs for data transformation at the graph level 
(#113) (lixueclaire) [#113](https://github.com/apache/graphar/pull/113)
+- [Improvement][Spark] Provide APIs for data reading and writing at the graph 
level  (#114) (Weibin Zeng) [#114](https://github.com/apache/graphar/pull/114)
+- [Examples][Spark] Add examples of integrating with the Neo4j spark connector 
as an application of GraphAr (#107) (lixueclaire) 
[#107](https://github.com/apache/graphar/pull/107)
+
+### Changed
+
+- Refine the overview figure and fix the typos in documentation (#117) 
(lixueclaire) [#117](https://github.com/apache/graphar/pull/117)
+- [Improvement][DevInfra] Reorg the code directory to easily to extend 
libraries (#116) (Weibin Zeng) 
[#116](https://github.com/apache/graphar/pull/116)
+- [Minor][Doc] Remove the invalid link (#121) (Weibin Zeng) 
[#121](https://github.com/apache/graphar/pull/121)
+
+### Fixed
+
+- [BugFix][Spark] Fix the bug that VertexWrite does not generate vertex count 
file (#110) (Weibin Zeng) [#110](https://github.com/apache/graphar/pull/110)
+
+## [v0.2.0] - 2023-02-23
+### Added
+
+- [Improvement] [Spark] Add methods for Spark Reader and improve the 
performance (#87) (lixueclaire) [#87](https://github.com/apache/graphar/pull/87)
+- Add pre-commit configuration and instructions (#93) (Tao He) 
[#93](https://github.com/apache/graphar/pull/93)
+- Handle comments correctly for preview PR docs (#94) (Tao He) 
[#94](https://github.com/apache/graphar/pull/94)
+- [Improve] Add auxiliary functions to get vertex chunk num or edge chunk num 
with infos (#95) (Weibin Zeng) [#95](https://github.com/apache/graphar/pull/95)
+- [Improve] Use gar-related names for arrow project and ccache to avoid 
duplicated project name (#102) (Weibin Zeng) 
[#102](https://github.com/apache/graphar/pull/102)
+- Add prefix to arrow definitions to avoid conflicts (#106) (Tao He) 
[#106](https://github.com/apache/graphar/pull/106)
+
+### Changed
+
+- [Improve][Spark] Improve the performance of GraphAr Spark Reader (#84) 
(lixueclaire) [#84](https://github.com/apache/graphar/pull/84)
+- Cast StringArray to LargeStringArray otherwise we will fill when we need to 
contenate chunks (#105) (Tao He) 
[#105](https://github.com/apache/graphar/pull/105)
+- [Improvement] Improve GraphAr spark writer performance and implement custom 
writer builder to bypass spark's write behavior (#92) (Weibin Zeng) 
[#92](https://github.com/apache/graphar/pull/92)
+- [Improvement][FileFormat] Write CSV payload files with header (#85) (Weibin 
Zeng) [#85](https://github.com/apache/graphar/pull/85)
+- Update the source code url of GraphScope fragment builder and writer (#103) 
(Weibin Zeng) [#103](https://github.com/apache/graphar/pull/103)
+
+### Fixed
+
+- [BugFix] Fix the Spark Writer bug when the column name contains a dot(.) 
(#101) (lixueclaire) [#101](https://github.com/apache/graphar/pull/101)
+- It should be linker flags, suppressing the clang warnings (#104) (Tao He) 
[#104](https://github.com/apache/graphar/pull/104)
+- Address issues in handling yaml-cpp correctly when requires GraphAr in 
external projects (#91) (Tao He) 
[#91](https://github.com/apache/graphar/pull/91)
+
+## [v0.1.0] - 2023-01-11
+### Added
+- Add ccache to github actions by @acezen in 
https://github.com/apache/incubator-graphar/pull/12
+- Add issue template and pull request template to help user easy to get… by 
@acezen in https://github.com/apache/incubator-graphar/pull/13
+- Add CODE_OF_CONDUCT.md by @acezen in 
https://github.com/apache/incubator-graphar/pull/26
+- Add InfoVersion to store version information of info and support data type 
extension base on info version by @acezen in 
https://github.com/apache/incubator-graphar/pull/27
+- Initialize the spark tool of GraphAr and implement the Info and 
IndexGenerator  by @acezen in 
https://github.com/apache/incubator-graphar/pull/45
+- organize an example pagerank app employing the gar library (#44) by 
@andydiwenzhu in https://github.com/apache/incubator-graphar/pull/46
+- Initialize the implementation of spark writer by @acezen in 
https://github.com/apache/incubator-graphar/pull/51
+- Initialize implementation for spark reader by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/52
+- Add release and reviewing tutorial to contributing guide by @acezen in 
https://github.com/apache/incubator-graphar/pull/53
+- Add introduction about GraphAr Spark tools in document  by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/58
+- Add spark tool api reference to doc by @acezen in 
https://github.com/apache/incubator-graphar/pull/59
+- Add Spark application examples using GraphAr Spark tools by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/61
+### Changed
+
+- Use the apache URL to download apache-arrow. by @sighingnow in 
https://github.com/apache/incubator-graphar/pull/7
+- Update gar-test submodule url by @acezen in 
https://github.com/apache/incubator-graphar/pull/6
+- Update README.rst by @yecol in 
https://github.com/apache/incubator-graphar/pull/11
+- Revise image links in docs by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/10
+- Refine documentation about integrating into GraphScope by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/15
+- Refine the contributing doc to more readable and easy to get started by 
@acezen in https://github.com/apache/incubator-graphar/pull/16
+- [Minor] Remove `docutils` version limit to fix docs ci by @acezen in 
https://github.com/apache/incubator-graphar/pull/57
+- Remove `include "arrow/api.h" from graph.h by @acezen in 
https://github.com/apache/incubator-graphar/pull/50
+- [Improve][Doc] Revise the README and APIs docstring of GraphAr by @acezen in 
https://github.com/apache/incubator-graphar/pull/64
+- [Improve][Doc] Refine the documentation about user guide and applications by 
@lixueclaire in https://github.com/apache/incubator-graphar/pull/69
+
+### Fixed
+
+- Fix the inconsistent prefix for vertex property chunks and update image 
links by @acezen in https://github.com/apache/incubator-graphar/pull/4
+- Fix the file suffix of bug report template by @acezen in 
https://github.com/apache/incubator-graphar/pull/17
+- Fix prefix of GAR files in document by @lixueclaire in 
https://github.com/apache/incubator-graphar/pull/56
+- [BugFix][Spark] Fix offset chunk output path and offset value of spark 
writer by @acezen in https://github.com/apache/incubator-graphar/pull/63
+- [MinorFix] Remove unnecessary file by @acezen in 
https://github.com/apache/incubator-graphar/pull/43
+- [BugFix] Hide the interface of dependencies of GraphAr with  `PRIVATE` link 
type by @acezen in https://github.com/apache/incubator-graphar/pull/71
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 02b8f18e..3e35b4bc 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -97,8 +97,7 @@ For small or first-time contributions, we recommend the dev 
container method. An
 ### Using a dev container environment
 
 GraphAr provides a pre-configured [dev container](https://containers.dev/)
-that could be used in [GitHub 
Codespaces](https://github.com/features/codespaces),
-[VSCode](https://code.visualstudio.com/docs/devcontainers/containers), 
[JetBrains](https://www.jetbrains.com/remote-development/gateway/),
+that could be used in 
[VSCode](https://code.visualstudio.com/docs/devcontainers/containers), 
[JetBrains](https://www.jetbrains.com/remote-development/gateway/),
 [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
 Please pick up your favorite runtime environment.
 
@@ -107,6 +106,10 @@ Please pick up your favorite runtime environment.
 Different components of GraphAr may require different setup steps. Please 
refer to their respective `README` documentation for more details.
 
 - [C++ Library](cpp/README.md)
-- [Java Library](java/README.md)
-- [Spark Library](spark/README.md)
-- [PySpark Library](pyspark/README.md)
+- [Scala with Spark Library](spark/README.md)
+- [Python with PySpark Library](pyspark/README.md) (under development)
+- [Java Library](java/README.md) (under development)
+
+----
+
+This doc refer from [Apache OpenDAL](https://opendal.apache.org/)
diff --git a/LICENSE b/LICENSE
index d1d8cf77..b9617232 100644
--- a/LICENSE
+++ b/LICENSE
@@ -212,7 +212,7 @@ Apache-2.0 licenses
 The following components are provided under the Apache-2.0 License. See 
project link for details.
 The text of each license is the standard Apache 2.0 license.
 
-* spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
+* Apache Spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
     Files:
       
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
       
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -234,9 +234,13 @@ The text of each license is the standard Apache 2.0 
license.
       
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCOutputWriter.scala
       
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCWriteBuilder.scala
       
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriteBuilder.scala
-    are modified from spark.
+    are modified from Apache Spark.
+
+* Apache Arrow 12.0.0 (https://github.com/apache/arrow)
+    Files:
+      dev/release/setup-ubuntu.sh
+    are modified from Apache Arrow.
 
-* arrow 12.0.0 (https://github.com/apache/arrow)
 * fastFFI v0.1.2 (https://github.com/alibaba/fastFFI)
     Files:
       
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdString.java
@@ -251,6 +255,12 @@ The text of each license is the standard Apache 2.0 
license.
       
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdUnorderedMap.java
     are modified from GraphScope.
 
+* Apache OpenDAL v0.45.1 (https://github.com/apache/opendal)
+    Files:
+      dev/release/release.py
+      dev/release/verify.py
+    are modified from OpenDAL.
+
 ================================================================
 MIT licenses
 ================================================================
diff --git a/NOTICE b/NOTICE
index cb5fbb1b..4dc3200b 100644
--- a/NOTICE
+++ b/NOTICE
@@ -31,3 +31,11 @@ which includes the following in its NOTICE file:
 
   fastFFI 
   Copyright 1999-2021 Alibaba Group Holding Ltd.
+
+--------------------------------------------------------------------------------
+
+This product includes code from Apache OpenDAL, which includes the following in
+its NOTICE file:
+
+  Apache OpenDAL
+  Copyright 2022 and onwards The Apache Software Foundation.
diff --git a/README.md b/README.md
index ad9e064b..af5fc3f6 100644
--- a/README.md
+++ b/README.md
@@ -207,8 +207,17 @@ See [GraphAr C++
 Library](./cpp) for
 details about the building of the C++ library.
 
+
+### The Scala with Spark Library
+
+See [GraphAr Spark
+Library](./maven-projects/spark)
+for details about the Scala with Spark library.
+
 ### The Java Library
 
+The Java library is under development. 
+
 The GraphAr Java library is created with bindings to the C++ library
 (currently at version v0.10.0), utilizing
 [Alibaba-FastFFI](https://github.com/alibaba/fastFFI) for
@@ -216,15 +225,11 @@ implementation. See [GraphAr Java
 Library](./maven-projects/java) for
 details about the building of the Java library.
 
-### The Spark Library
-
-See [GraphAr Spark
-Library](./maven-projects/spark)
-for details about the Spark library.
+### The Python with PySpark Library
 
-### The PySpark Library
+The Python with PySpark library is under development.
 
-The GraphAr PySpark library is developed as bindings to the GraphAr
+The PySpark library is developed as bindings to the GraphAr
 Spark library. See [GraphAr PySpark
 Library](./pyspark)
 for details about the PySpark library.
diff --git a/buf.gen.yaml b/buf.gen.yaml
new file mode 100644
index 00000000..6efdffa7
--- /dev/null
+++ b/buf.gen.yaml
@@ -0,0 +1,18 @@
+version: v2
+managed:
+  enabled: true
+  disable:
+    - file_option: java_package
+plugins:
+  # Python classes
+  - remote: buf.build/protocolbuffers/python:v27.1
+    out: pyspark/graphar_pyspark/proto/
+  # Python headers for IDEs and MyPy
+  - remote: buf.build/protocolbuffers/pyi
+    out: pyspark/graphar_pyspark/proto/
+  # Cpp
+  - remote: buf.build/protocolbuffers/cpp:v27.1
+    out: cpp/src/proto
+  # Java
+  - remote: buf.build/protocolbuffers/java:v27.1
+    out: maven-projects/info/src/main/java/
diff --git a/buf.yaml b/buf.yaml
new file mode 100644
index 00000000..bda430e8
--- /dev/null
+++ b/buf.yaml
@@ -0,0 +1,3 @@
+version: v2
+modules:
+  - path: format
diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index fe81d18f..45a14c4d 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -32,8 +32,8 @@ if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
 endif()
 
 set(GRAPHAR_MAJOR_VERSION 0)
-set(GRAPHAR_MINOR_VERSION 11)
-set(GRAPHAR_PATCH_VERSION 4)
+set(GRAPHAR_MINOR_VERSION 12)
+set(GRAPHAR_PATCH_VERSION 0)
 set(GREAPHAR_VERSION 
${GRAPHAR_MAJOR_VERSION}.${GRAPHAR_MINOR_VERSION}.${GRAPHAR_PATCH_VERSION})
 project(graphar-cpp LANGUAGES C CXX VERSION ${GREAPHAR_VERSION})
 
diff --git a/cpp/README.md b/cpp/README.md
index a2891026..743f0476 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -67,9 +67,7 @@ repository and navigated to the ``cpp`` subdirectory with:
 
 ```bash
     $ git clone https://github.com/apache/graphar.git
-    $ cd graphar
-    $ git submodule update --init
-    $ cd cpp
+    $ cd graphar/cpp
 ```
 
 Release build:
diff --git a/cpp/test/test_arrow_chunk_reader.cc 
b/cpp/test/test_arrow_chunk_reader.cc
index 10e718ba..74d8041d 100644
--- a/cpp/test/test_arrow_chunk_reader.cc
+++ b/cpp/test/test_arrow_chunk_reader.cc
@@ -158,8 +158,7 @@ TEST_CASE_METHOD(GlobalFixture, "ArrowChunkReader") {
                       << '\n';
             std::cout << "Column Nums: " << table->num_columns() << "\n";
             std::cout << "Column Names: ";
-            for (int i = 0;
-                 i < table->num_columns() && i < expected_cols.size(); i++) {
+            for (int i = 0; i < table->num_columns(); i++) {
               REQUIRE(table->ColumnNames()[i] == expected_cols[i]);
               std::cout << "`" << table->ColumnNames()[i] << "` ";
             }
diff --git a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh 
b/dev/download_test_data.sh
similarity index 54%
copy from maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
copy to dev/download_test_data.sh
index 40c07db3..83555be3 100755
--- a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
+++ b/dev/download_test_data.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-
+#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -17,17 +17,16 @@
 # specific language governing permissions and limitations
 # under the License.
 
+# A script to download test data for GraphAr
 
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_0_0.csv"
-person_knows_person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_knows_person_0_0.csv"
-output_dir="/tmp/graphar/ldbc_sample"
-
-vertex_chunk_size=100
-edge_chunk_size=1024
-file_type="parquet"
-spark-submit --class org.apache.graphar.example.LdbcSample2GraphAr ${jar_file} 
\
-    ${person_input_file} ${person_knows_person_input_file} ${output_dir} 
${vertex_chunk_size} ${edge_chunk_size} ${file_type}
+if [ -n "${GAR_TEST_DATA}" ]; then
+    if [[ ! -d "$GAR_TEST_DATA" ]]; then
+        echo "GAR_TEST_DATA is set but the directory does not exist, cloning 
the test data to $GAR_TEST_DATA"
+        git clone https://github.com/apache/incubator-graphar-testing.git 
"$GAR_TEST_DATA" --depth 1 || true
+    fi
+else
+    echo "GAR_TEST_DATA is not set, cloning the test data to 
/tmp/graphar-testing"
+    git clone https://github.com/apache/incubator-graphar-testing.git 
/tmp/graphar-testing --depth 1 || true
+    echo "Test data has been cloned to /tmp/graphar-testing, please run"
+    echo "    export GAR_TEST_DATA=/tmp/graphar-testing"
+fi
diff --git a/maven-projects/spark/import/neo4j.sh 
b/dev/release/conda_env_cpp.txt
old mode 100755
new mode 100644
similarity index 72%
copy from maven-projects/spark/import/neo4j.sh
copy to dev/release/conda_env_cpp.txt
index dbae0273..c0025b04
--- a/maven-projects/spark/import/neo4j.sh
+++ b/dev/release/conda_env_cpp.txt
@@ -1,4 +1,3 @@
-#!/bin/bash
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -16,12 +15,8 @@
 # specific language governing permissions and limitations
 # under the License.
 
-
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-conf_path="$(readlink -f $1)"
-
-spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
-    ${conf_path}
+cmake
+conda-forge::arrow-cpp=13.0.0
+make
+clangxx_linux-64
+conda-forge::catch2=3.6.0
diff --git a/maven-projects/spark/import/neo4j.sh 
b/dev/release/conda_env_scala.txt
old mode 100755
new mode 100644
similarity index 72%
copy from maven-projects/spark/import/neo4j.sh
copy to dev/release/conda_env_scala.txt
index dbae0273..c63df3f9
--- a/maven-projects/spark/import/neo4j.sh
+++ b/dev/release/conda_env_scala.txt
@@ -1,4 +1,3 @@
-#!/bin/bash
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -16,12 +15,5 @@
 # specific language governing permissions and limitations
 # under the License.
 
-
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-conf_path="$(readlink -f $1)"
-
-spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
-    ${conf_path}
+maven
+openjdk=11.0.13
\ No newline at end of file
diff --git a/dev/release/release.py b/dev/release/release.py
new file mode 100644
index 00000000..366ddeab
--- /dev/null
+++ b/dev/release/release.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache OpenDAL v0.45.1
+# https://github.com/apache/opendal/blob/5079125/scripts/release.py
+
+import re
+import subprocess
+from pathlib import Path
+
+ROOT_DIR = Path(__file__).parent.parent.parent
+
+def get_package_version():
+    major_version = None
+    minor_version = None
+    patch_version = None
+    major_pattern = 
re.compile(r'set\s*\(\s*GRAPHAR_MAJOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+    minor_pattern = 
re.compile(r'set\s*\(\s*GRAPHAR_MINOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+    patch_pattern = 
re.compile(r'set\s*\(\s*GRAPHAR_PATCH_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+
+    file_path = ROOT_DIR / "cpp/CMakeLists.txt"
+    with open(file_path, 'r') as file:
+        for line in file:
+            major_match = major_pattern.search(line)
+            minor_match = minor_pattern.search(line)
+            patch_match = patch_pattern.search(line)
+
+            if major_match:
+                major_version = major_match.group(1)
+            if minor_match:
+                minor_version = minor_match.group(1)
+            if patch_match:
+                patch_version = patch_match.group(1)
+
+    if major_version and minor_version and patch_version:
+        return f"{major_version}.{minor_version}.{patch_version}"
+    else:
+        return None
+
+def archive_source_package():
+    print(f"Archive source package started")
+
+    version = get_package_version()
+    assert version, "Failed to get the package version"
+    name = f"apache-graphar-{version}-incubating-src"
+
+    archive_command = [
+        "git",
+        "archive",
+        "--prefix",
+        f"apache-graphar-{version}-incubating-src/",
+        "-o",
+        f"{ROOT_DIR}/dist/{name}.tar.gz",
+        "HEAD",
+    ]
+    subprocess.run(
+        archive_command,
+        cwd=ROOT_DIR,
+        check=True,
+    )
+
+    print(f"Archive source package to dist/{name}.tar.gz")
+
+
+def generate_signature():
+    for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+        print(f"Generate signature for {i}")
+        subprocess.run(
+            ["gpg", "--yes", "--armor", "--output", f"{i}.asc", 
"--detach-sig", str(i)],
+            cwd=ROOT_DIR / "dist",
+            check=True,
+        )
+
+    for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+        print(f"Check signature for {i}")
+        subprocess.run(
+            ["gpg", "--verify", f"{i}.asc", str(i)], cwd=ROOT_DIR / "dist", 
check=True
+        )
+
+
+def generate_checksum():
+    for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+        print(f"Generate checksum for {i}")
+        subprocess.run(
+            ["sha512sum", str(i.relative_to(ROOT_DIR / "dist"))],
+            stdout=open(f"{i}.sha512", "w"),
+            cwd=ROOT_DIR / "dist",
+            check=True,
+        )
+
+    for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+        print(f"Check checksum for {i}")
+        subprocess.run(
+            ["sha512sum", "--check", f"{str(i.relative_to(ROOT_DIR / 
'dist'))}.sha512"],
+            cwd=ROOT_DIR / "dist",
+            check=True,
+        )
+
+
+if __name__ == "__main__":
+    (ROOT_DIR / "dist").mkdir(exist_ok=True)
+    archive_source_package()
+    generate_signature()
+    generate_checksum()
diff --git a/dev/release/setup-ubuntu.sh b/dev/release/setup-ubuntu.sh
new file mode 100644
index 00000000..6e74b3fc
--- /dev/null
+++ b/dev/release/setup-ubuntu.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache Arrow 12.0.0
+# https://github.com/apache/arrow/blob/9736dde/dev/release/setup-ubuntu.sh
+
+# A script to install dependencies required for release
+# verification on Ubuntu.
+
+set -exu
+
+codename=$(. /etc/os-release && echo ${UBUNTU_CODENAME})
+id=$(. /etc/os-release && echo ${ID})
+
+apt-get install -y -q --no-install-recommends \
+  build-essential \
+  cmake \
+  git \
+  gnupg \
+  libcurl4-openssl-dev \
+  maven \
+  openjdk-11-jdk \
+  wget \
+  pkg-config \
+  tzdata \
+  subversion
+
+wget -c 
https://apache.jfrog.io/artifactory/arrow/${id}/apache-arrow-apt-source-latest-${codename}.deb
 \
+    -P /tmp/
+apt-get install -y -q /tmp/apache-arrow-apt-source-latest-${codename}.deb
+apt-get update -y -q
+apt-get install -y -q --no-install-recommends \
+  libarrow-dev \
+  libarrow-dataset-dev \
+  libarrow-acero-dev \
+  libparquet-dev
diff --git a/dev/release/verify.py b/dev/release/verify.py
new file mode 100644
index 00000000..fe9c46ce
--- /dev/null
+++ b/dev/release/verify.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache OpenDAL v0.45.1
+# https://github.com/apache/opendal/blob//5079125/scripts/verify.py
+
+import subprocess
+import os
+from pathlib import Path
+
+BASE_DIR = Path(os.getcwd())
+
+# Define colors for output
+YELLOW = "\033[37;1m"
+GREEN = "\033[32;1m"
+ENDCOLOR = "\033[0m"
+
+
+def check_signature(pkg):
+    """Check the GPG signature of the package."""
+    try:
+        subprocess.check_call(["gpg", "--verify", f"{pkg}.asc", pkg])
+        print(f"{GREEN}> Success to verify the gpg sign for {pkg}{ENDCOLOR}")
+    except subprocess.CalledProcessError:
+        print(f"{YELLOW}> Failed to verify the gpg sign for {pkg}{ENDCOLOR}")
+
+
+def check_sha512sum(pkg):
+    """Check the sha512 checksum of the package."""
+    try:
+        subprocess.check_call(["sha512sum", "--check", f"{pkg}.sha512"])
+        print(f"{GREEN}> Success to verify the checksum for {pkg}{ENDCOLOR}")
+    except subprocess.CalledProcessError:
+        print(f"{YELLOW}> Failed to verify the checksum for {pkg}{ENDCOLOR}")
+
+
+def extract_packages():
+    for file in BASE_DIR.glob("*.tar.gz"):
+        subprocess.run(["tar", "-xzf", file], check=True)
+
+
+def check_license(dir):
+    print(f"> Start checking LICENSE file in {dir}")
+    if not (dir / "LICENSE").exists():
+        raise f"{YELLOW}> LICENSE file is not found{ENDCOLOR}"
+    print(f"{GREEN}> LICENSE file exists in {dir}{ENDCOLOR}")
+
+
+def check_notice(dir):
+    print(f"> Start checking NOTICE file in {dir}")
+    if not (dir / "NOTICE").exists():
+        raise f"{YELLOW}> NOTICE file is not found{ENDCOLOR}"
+    print(f"{GREEN}> NOTICE file exists in {dir}{ENDCOLOR}")
+
+
+def install_conda():
+    print("Start installing conda")
+    subprocess.run(["wget", 
"https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh";], 
check=True)
+    subprocess.run(["bash", "Miniconda3-latest-Linux-x86_64.sh", "-b"], 
check=True)
+    print(f"{GREEN}Success to install conda{ENDCOLOR}")
+
+
+def maybe_setup_conda(dependencies):
+    # Optionally setup a conda environment with the given dependencies
+    if ("USE_CONDA" in os.environ) and (os.environ["USE_CONDA"] > 0):
+        print("Configuring conda environment...")
+        subprocess.run(["conda", "deactivate"], check=False, 
stderr=subprocess.STDOUT)
+        create_env_command = ["conda", "create", "--name", "graphar", "--yes", 
"python=3.8"]
+        subprocess.run(create_env_command, check=True, 
stderr=subprocess.STDOUT)
+        install_deps_command = ["conda", "install", "--name", "graphar", 
"--yes"] + dependencies
+        subprocess.run(install_deps_command, check=True, 
stderr=subprocess.STDOUT)
+        subprocess.run(["conda", "activate", "graphar"], check=True, 
stderr=subprocess.STDOUT, shell=True)
+
+
+def build_and_test_cpp(dir):
+    print("Start building, install and test C++ library")
+
+    maybe_setup_conda(["--file", f"{dir}/dev/release/conda_env_cpp.txt"])
+
+    cmake_command = ["cmake", ".", "-DBUILD_TESTS=ON"]
+    subprocess.run(
+        cmake_command,
+        cwd=dir / "cpp",
+        check=True,
+        stderr=subprocess.STDOUT,
+    )
+    build_and_install_command = [
+        "cmake",
+        "--build",
+        ".",
+        "--target",
+        "install",
+    ]
+    subprocess.run(
+        build_and_install_command,
+        cwd=dir / "cpp",
+        check=True,
+        stderr=subprocess.STDOUT,
+    )
+    test_command = [
+        "ctest",
+        "--output-on-failure",
+        "--timeout",
+        "300",
+        "-VV"
+    ]
+    subprocess.run(
+        test_command,
+        cwd=dir / "cpp",
+        check=True,
+        stderr=subprocess.STDOUT,
+    )
+    print(f"{GREEN}Success to build graphar c++{ENDCOLOR}")
+
+
+def build_and_test_scala(dir):
+    print("Start building, install and test Scala with Spark library")
+
+    maybe_setup_conda(["--file", f"{dir}/dev/release/conda_env_scala.txt"])
+
+    build_command_32=["mvn", "clean", "package", "-P", "datasource32"]
+    subprocess.run(
+        build_command_32,
+        cwd=dir / "maven-projects/spark",
+        check=True,
+        stderr=subprocess.STDOUT,
+    )
+    build_command_33=["mvn", "clean", "package", "-P", "datasource33"]
+    subprocess.run(
+        build_command_33,
+        cwd=dir / "maven-projects/spark",
+        check=True,
+        stderr=subprocess.STDOUT,
+    )
+
+    print(f"{GREEN}Success to build graphar scala{ENDCOLOR}")
+
+if __name__ == "__main__":
+    # Get a list of all files in the current directory
+    files = [f for f in os.listdir(".") if os.path.isfile(f)]
+
+    for pkg in files:
+        # Skip files that don't have a corresponding .asc or .sha512 file
+        if not os.path.exists(f"{pkg}.asc") or not 
os.path.exists(f"{pkg}.sha512"):
+            continue
+
+        print(f"> Checking {pkg}")
+
+        # Perform the checks
+        check_signature(pkg)
+        check_sha512sum(pkg)
+
+    extract_packages()
+
+    for dir in BASE_DIR.glob("apache-graphar-*-src/"):
+        check_license(dir)
+        check_notice(dir)
+        build_and_test_cpp(dir)
+        build_and_test_scala(dir)
diff --git a/format/adjacent_list.proto b/format/adjacent_list.proto
index 705de694..21312530 100644
--- a/format/adjacent_list.proto
+++ b/format/adjacent_list.proto
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 import "types.proto";
 
diff --git a/format/edge_info.proto b/format/edge_info.proto
index 8385f4ae..72f5757c 100644
--- a/format/edge_info.proto
+++ b/format/edge_info.proto
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 import "property_group.proto";
 import "adjacent_list.proto";
diff --git a/format/graph_info.proto b/format/graph_info.proto
index 6490c570..e5c6e2ec 100644
--- a/format/graph_info.proto
+++ b/format/graph_info.proto
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 import "vertex_info.proto";
 import "edge_info.proto";
diff --git a/format/property_group.proto b/format/property_group.proto
index 95b8c522..5cdbb42b 100644
--- a/format/property_group.proto
+++ b/format/property_group.proto
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 import "types.proto";
 
diff --git a/format/types.proto b/format/types.proto
index fc152b41..234b9e86 100644
--- a/format/types.proto
+++ b/format/types.proto
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 enum DataType {
     BOOL = 0;
diff --git a/format/vertex_info.proto b/format/vertex_info.proto
index da158674..136b89c9 100644
--- a/format/vertex_info.proto
+++ b/format/vertex_info.proto
@@ -1,4 +1,4 @@
-/*
+ /*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -20,7 +20,8 @@
 syntax = "proto3";
 
 package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
 
 import "property_group.proto";
 
diff --git a/licenserc.toml b/licenserc.toml
index b6e0919a..ed4a4c14 100644
--- a/licenserc.toml
+++ b/licenserc.toml
@@ -45,7 +45,9 @@ excludes = [
   "cpp/thirdparty",
   "cpp/misc/cpplint.py",
   "spark/datasources-32/src/main/scala/org/apache/graphar/datasources",
+  "spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar",
   "spark/datasources-33/src/main/scala/org/apache/graphar/datasources",
+  "spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar",
   "java/src/main/java/org/apache/graphar/stdcxx/StdString.java",
   "java/src/main/java/org/apache/graphar/stdcxx/StdVector.java",
   "java/src/main/java/org/apache/graphar/stdcxx/StdSharedPtr.java",
diff --git a/maven-projects/info/pom.xml b/maven-projects/info/pom.xml
index 79d4119e..ea59280d 100644
--- a/maven-projects/info/pom.xml
+++ b/maven-projects/info/pom.xml
@@ -34,6 +34,7 @@
 
     <artifactId>info</artifactId>
     <packaging>jar</packaging>
+    <version>0.13.0.dev-SNAPSHOT</version>
 
     <name>info</name>
 
diff --git a/maven-projects/java/README.md b/maven-projects/java/README.md
index 12572e13..3a3f15d3 100644
--- a/maven-projects/java/README.md
+++ b/maven-projects/java/README.md
@@ -1,4 +1,4 @@
-# GraphAr Java
+# GraphAr Java (under development)
 
 This directory contains the code and build system for the GraphAr Java library 
which powered by [Alibaba-FastFFI](https://github.com/alibaba/fastFFI).
 
diff --git a/maven-projects/java/pom.xml b/maven-projects/java/pom.xml
index a5a1fdf4..e0c3b4d3 100644
--- a/maven-projects/java/pom.xml
+++ b/maven-projects/java/pom.xml
@@ -34,6 +34,7 @@
 
     <artifactId>java</artifactId>
     <packaging>jar</packaging>
+    <version>0.13.0.dev-SNAPSHOT</version>
 
     <name>java</name>
 
diff --git a/maven-projects/pom.xml b/maven-projects/pom.xml
index beb592dc..79d4b661 100644
--- a/maven-projects/pom.xml
+++ b/maven-projects/pom.xml
@@ -69,7 +69,7 @@
         <url>https://github.com/apache/graphar</url>
     </scm>-->
     <properties>
-        <graphar.version>0.1.0-SNAPSHOT</graphar.version>
+        <graphar.version>0.12.0-SNAPSHOT</graphar.version>
     </properties>
     <modules>
         <module>java</module>
diff --git a/maven-projects/spark/README.md b/maven-projects/spark/README.md
index a0967ca0..cb7921bf 100644
--- a/maven-projects/spark/README.md
+++ b/maven-projects/spark/README.md
@@ -21,7 +21,6 @@ repository and navigated to the ``spark`` subdirectory:
 ```bash
     $ git clone https://github.com/apache/incubator-graphar.git
     $ cd incubator-graphar
-    $ git submodule update --init
     $ cd mavens-projects/spark
 ```
 
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
index 38a3c183..7424ad68 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -16,24 +16,24 @@
 
 package org.apache.graphar.datasources
 
-import scala.collection.JavaConverters._
-import scala.util.matching.Regex
-import java.util
-
 import com.fasterxml.jackson.databind.ObjectMapper
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
-
+import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.connector.catalog.{Table, TableProvider}
+import org.apache.spark.sql.connector.expressions.Transform
 import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
 import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
 import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
+import org.apache.spark.sql.graphar.GarTable
+import org.apache.spark.sql.sources.DataSourceRegister
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
-import org.apache.spark.sql.sources.DataSourceRegister
-import org.apache.spark.sql.connector.expressions.Transform
+
+import java.util
+import scala.collection.JavaConverters._
+import scala.util.matching.Regex
 
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
index 07cff02e..ef64ec39 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
@@ -17,16 +17,14 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
 
 import org.apache.graphar.GeneralParams
-
-import org.json4s._
-import org.json4s.jackson.JsonMethods._
-
-import 
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
 import org.apache.hadoop.mapreduce._
 import org.apache.spark.internal.Logging
+import 
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
+import org.json4s._
+import org.json4s.jackson.JsonMethods._
 
 object GarCommitProtocol {
   private def binarySearchPair(aggNums: Array[Int], key: Int): (Int, Int) = {
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
index 4b063db7..b6027f8a 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
@@ -17,40 +17,39 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
 
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+package org.apache.spark.sql.graphar
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.parquet.hadoop.ParquetInputFormat
-
 import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.expressions.{Expression, ExprUtils}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.catalyst.expressions.{ExprUtils, Expression}
 import org.apache.spark.sql.connector.read.PartitionReaderFactory
 import org.apache.spark.sql.execution.PartitionedFileUtil
-import org.apache.spark.sql.execution.datasources.{
-  FilePartition,
-  PartitioningAwareFileIndex,
-  PartitionedFile
-}
 import org.apache.spark.sql.execution.datasources.parquet.{
   ParquetOptions,
   ParquetReadSupport,
   ParquetWriteSupport
 }
 import org.apache.spark.sql.execution.datasources.v2.FileScan
-import 
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
-import 
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
 import 
org.apache.spark.sql.execution.datasources.v2.csv.CSVPartitionReaderFactory
+import 
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
+import 
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
+import org.apache.spark.sql.execution.datasources.{
+  FilePartition,
+  PartitionedFile,
+  PartitioningAwareFileIndex
+}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.Filter
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 import org.apache.spark.util.SerializableConfiguration
 
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
 /** GarScan is a class to implement the file scan for GarDataSource. */
 case class GarScan(
     sparkSession: SparkSession,
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
similarity index 99%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
index 1e83c773..0ae95894 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
@@ -17,20 +17,19 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala
 
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
 
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.connector.read.{Scan, SupportsPushDownFilters}
 import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
-
 import org.apache.spark.sql.execution.datasources.v2.FileScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
 import org.apache.spark.sql.sources.Filter
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 import scala.collection.JavaConverters._
-import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
-import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
 
 /** GarScanBuilder is a class to build the file scan for GarDataSource. */
 case class GarScanBuilder(
@@ -49,6 +48,7 @@ case class GarScanBuilder(
   }
 
   private var filters: Array[Filter] = Array.empty
+
   override def pushFilters(filters: Array[Filter]): Array[Filter] = {
     this.filters = filters
     filters
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
similarity index 95%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
index 8aa23179..acf4943c 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
@@ -17,26 +17,24 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
 
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
 
 import org.apache.hadoop.fs.FileStatus
-
 import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
 import org.apache.spark.sql.execution.datasources.FileFormat
 import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
 import org.apache.spark.sql.execution.datasources.orc.OrcUtils
 import org.apache.spark.sql.execution.datasources.parquet.ParquetUtils
 import org.apache.spark.sql.execution.datasources.v2.FileTable
+import org.apache.spark.sql.graphar.csv.CSVWriteBuilder
+import org.apache.spark.sql.graphar.orc.OrcWriteBuilder
+import org.apache.spark.sql.graphar.parquet.ParquetWriteBuilder
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
-import org.apache.graphar.datasources.csv.CSVWriteBuilder
-import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
-import org.apache.graphar.datasources.orc.OrcWriteBuilder
+import scala.collection.JavaConverters._
 
 /** GarTable is a class to represent the graph data in GraphAr as a table. */
 case class GarTable(
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
similarity index 97%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
index 3acd9247..f6caa75d 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
@@ -17,27 +17,22 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriteBuilder.scala
 
-package org.apache.graphar.datasources
-
-import java.util.UUID
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapreduce.Job
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
-import org.apache.hadoop.mapreduce.Job
-
-import org.apache.spark.sql.execution.datasources.OutputWriterFactory
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
 import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils}
 import org.apache.spark.sql.connector.write.{
   BatchWrite,
   LogicalWriteInfo,
   WriteBuilder
 }
+import org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
 import org.apache.spark.sql.execution.datasources.{
   BasicWriteJobStatsTracker,
   DataSource,
@@ -48,8 +43,9 @@ import org.apache.spark.sql.execution.metric.SQLMetric
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.util.SerializableConfiguration
-import org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
+
+import java.util.UUID
+import scala.collection.JavaConverters._
 
 abstract class GarWriteBuilder(
     paths: Seq[String],
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
similarity index 96%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
index c0a38d52..7dd4dda8 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
@@ -17,23 +17,22 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVWriteBuilder.scala
 
-package org.apache.graphar.datasources.csv
+package org.apache.spark.sql.graphar.csv
 
 import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
 import org.apache.spark.sql.catalyst.util.CompressionCodecs
 import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
 import org.apache.spark.sql.execution.datasources.{
   CodecStreams,
   OutputWriter,
   OutputWriterFactory
 }
-import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
+import org.apache.spark.sql.graphar.GarWriteBuilder
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructType}
 
-import org.apache.graphar.datasources.GarWriteBuilder
-
 class CSVWriteBuilder(
     paths: Seq[String],
     formatName: String,
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
similarity index 96%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
index c1d2ff82..e86e6629 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
@@ -18,18 +18,17 @@
 // we have to reimplement it here.
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
 
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.io.NullWritable
 import org.apache.hadoop.mapreduce.TaskAttemptContext
 import org.apache.orc.OrcFile
 import org.apache.orc.mapred.{
-  OrcOutputFormat => OrcMapRedOutputFormat,
-  OrcStruct
+  OrcStruct,
+  OrcOutputFormat => OrcMapRedOutputFormat
 }
 import org.apache.orc.mapreduce.{OrcMapreduceRecordWriter, OrcOutputFormat}
-
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.execution.datasources.OutputWriter
 import org.apache.spark.sql.execution.datasources.orc.{OrcSerializer, OrcUtils}
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
similarity index 97%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
index 9bdf796b..05147c14 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
@@ -17,24 +17,22 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/ORCWriteBuilder.scala
 
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
 
 import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
 import org.apache.orc.OrcConf.{COMPRESS, MAPRED_OUTPUT_SCHEMA}
 import org.apache.orc.mapred.OrcStruct
-
 import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
 import org.apache.spark.sql.execution.datasources.{
   OutputWriter,
   OutputWriterFactory
 }
-import org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
+import org.apache.spark.sql.graphar.GarWriteBuilder
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 
-import org.apache.graphar.datasources.GarWriteBuilder
-
 object OrcWriteBuilder {
   // the getQuotedSchemaString method of spark OrcFileFormat
   private def getQuotedSchemaString(dataType: DataType): String =
diff --git 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
similarity index 96%
rename from 
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
rename to 
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
index 8d7feceb..d75f725e 100644
--- 
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
@@ -17,28 +17,25 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
 
-package org.apache.graphar.datasources.parquet
+package org.apache.spark.sql.graphar.parquet
 
 import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext}
-import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
 import org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel
 import org.apache.parquet.hadoop.codec.CodecConfig
 import org.apache.parquet.hadoop.util.ContextUtil
-
-import org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter
+import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.parquet._
 import org.apache.spark.sql.execution.datasources.{
   OutputWriter,
   OutputWriterFactory
 }
-import org.apache.spark.sql.execution.datasources.parquet._
+import org.apache.spark.sql.graphar.GarWriteBuilder
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 
-import org.apache.graphar.datasources.GarWriteBuilder
-
 class ParquetWriteBuilder(
     paths: Seq[String],
     formatName: String,
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
index 38a3c183..b6094914 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -19,11 +19,9 @@ package org.apache.graphar.datasources
 import scala.collection.JavaConverters._
 import scala.util.matching.Regex
 import java.util
-
 import com.fasterxml.jackson.databind.ObjectMapper
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
-
 import org.apache.spark.sql.connector.catalog.{Table, TableProvider}
 import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.SparkSession
@@ -34,6 +32,7 @@ import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 import org.apache.spark.sql.sources.DataSourceRegister
 import org.apache.spark.sql.connector.expressions.Transform
+import org.apache.spark.sql.graphar.GarTable
 
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
similarity index 93%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
index 8be2e237..c6ca79c2 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
@@ -17,7 +17,7 @@
 // Derived from Apache Spark 3.3.4
 // 
https://github.com/apache/spark/blob/18db204/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
 
 import org.apache.graphar.GeneralParams
 
@@ -73,16 +73,14 @@ class GarCommitProtocol(
     val partitionId = taskContext.getTaskAttemptID.getTaskID.getId
     if (options.contains(GeneralParams.offsetStartChunkIndexKey)) {
       // offset chunk file name, looks like chunk0
-      val chunk_index = options
-        .get(GeneralParams.offsetStartChunkIndexKey)
-        .get
-        .toInt + partitionId
+      val chunk_index =
+        options(GeneralParams.offsetStartChunkIndexKey).toInt + partitionId
       return f"chunk$chunk_index"
     }
     if (options.contains(GeneralParams.aggNumListOfEdgeChunkKey)) {
       // edge chunk file name, looks like part0/chunk0
       val jValue = parse(
-        options.get(GeneralParams.aggNumListOfEdgeChunkKey).get
+        options(GeneralParams.aggNumListOfEdgeChunkKey)
       )
       implicit val formats =
         DefaultFormats // initialize a default formats for json4s
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
index bf4995b0..feaa7e56 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
@@ -17,24 +17,20 @@
 // Derived from Apache Spark 3.3.4
 // 
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
 
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+package org.apache.spark.sql.graphar
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.parquet.hadoop.ParquetInputFormat
-
 import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.expressions.{Expression, ExprUtils}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.catalyst.expressions.{ExprUtils, Expression}
 import org.apache.spark.sql.connector.read.PartitionReaderFactory
 import org.apache.spark.sql.execution.PartitionedFileUtil
 import org.apache.spark.sql.execution.datasources.{
   FilePartition,
-  PartitioningAwareFileIndex,
-  PartitionedFile
+  PartitionedFile,
+  PartitioningAwareFileIndex
 }
 import org.apache.spark.sql.execution.datasources.parquet.{
   ParquetOptions,
@@ -42,15 +38,18 @@ import org.apache.spark.sql.execution.datasources.parquet.{
   ParquetWriteSupport
 }
 import org.apache.spark.sql.execution.datasources.v2.FileScan
-import 
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
-import 
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
 import 
org.apache.spark.sql.execution.datasources.v2.csv.CSVPartitionReaderFactory
+import 
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
+import 
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.Filter
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 import org.apache.spark.util.SerializableConfiguration
 
+import scala.collection.mutable.ArrayBuffer
+import scala.jdk.CollectionConverters._
+
 /** GarScan is a class to implement the file scan for GarDataSource. */
 case class GarScan(
     sparkSession: SparkSession,
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
index 85f43e59..94fe5752 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
@@ -17,20 +17,19 @@
 // Derived from Apache Spark 3.3.4
 // 
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala
 
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
 
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.connector.read.Scan
 import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
-
 import org.apache.spark.sql.execution.datasources.v2.FileScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
 import org.apache.spark.sql.sources.Filter
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 import scala.collection.JavaConverters._
-import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
-import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
 
 /** GarScanBuilder is a class to build the file scan for GarDataSource. */
 case class GarScanBuilder(
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
similarity index 95%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
index 8aa23179..acf4943c 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
@@ -17,26 +17,24 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
 
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
 
 import org.apache.hadoop.fs.FileStatus
-
 import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
 import org.apache.spark.sql.execution.datasources.FileFormat
 import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
 import org.apache.spark.sql.execution.datasources.orc.OrcUtils
 import org.apache.spark.sql.execution.datasources.parquet.ParquetUtils
 import org.apache.spark.sql.execution.datasources.v2.FileTable
+import org.apache.spark.sql.graphar.csv.CSVWriteBuilder
+import org.apache.spark.sql.graphar.orc.OrcWriteBuilder
+import org.apache.spark.sql.graphar.parquet.ParquetWriteBuilder
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
-import org.apache.graphar.datasources.csv.CSVWriteBuilder
-import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
-import org.apache.graphar.datasources.orc.OrcWriteBuilder
+import scala.collection.JavaConverters._
 
 /** GarTable is a class to represent the graph data in GraphAr as a table. */
 case class GarTable(
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
index 8363ae26..009d5da7 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
@@ -17,7 +17,7 @@
 // Derived from Apache Spark 3.3.4
 // 
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriteBuilder.scala
 
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
 
 import java.util.UUID
 
@@ -27,7 +27,6 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapreduce.Job
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
-import org.apache.hadoop.mapreduce.Job
 
 import org.apache.spark.sql.execution.datasources.OutputWriterFactory
 import org.apache.spark.sql.SparkSession
@@ -41,7 +40,6 @@ import org.apache.spark.sql.connector.write.{
 import org.apache.spark.sql.execution.datasources.{
   BasicWriteJobStatsTracker,
   DataSource,
-  OutputWriterFactory,
   WriteJobDescription
 }
 import org.apache.spark.sql.execution.metric.SQLMetric
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
similarity index 96%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
index c0a38d52..68e156e0 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
@@ -17,7 +17,7 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVWriteBuilder.scala
 
-package org.apache.graphar.datasources.csv
+package org.apache.spark.sql.graphar.csv
 
 import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
 import org.apache.spark.sql.catalyst.csv.CSVOptions
@@ -31,8 +31,7 @@ import org.apache.spark.sql.execution.datasources.{
 import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructType}
-
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
 
 class CSVWriteBuilder(
     paths: Seq[String],
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
index c1d2ff82..ccc7a48e 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
@@ -18,7 +18,7 @@
 // we have to reimplement it here.
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
 
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.io.NullWritable
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
similarity index 97%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
index 9bdf796b..287162f8 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
@@ -17,7 +17,7 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/ORCWriteBuilder.scala
 
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
 
 import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
@@ -33,7 +33,7 @@ import 
org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
 
 object OrcWriteBuilder {
   // the getQuotedSchemaString method of spark OrcFileFormat
diff --git 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
similarity index 98%
rename from 
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
rename to 
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
index 5c92204b..8e53dc5f 100644
--- 
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
+++ 
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
@@ -17,7 +17,7 @@
 // Derived from Apache Spark 3.1.1
 // 
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
 
-package org.apache.graphar.datasources.parquet
+package org.apache.spark.sql.graphar.parquet
 
 import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext}
 import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
@@ -36,7 +36,7 @@ import org.apache.spark.sql.execution.datasources.parquet._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
 
 class ParquetWriteBuilder(
     paths: Seq[String],
diff --git a/maven-projects/spark/graphar/pom.xml 
b/maven-projects/spark/graphar/pom.xml
index 45b99fbf..74626a62 100644
--- a/maven-projects/spark/graphar/pom.xml
+++ b/maven-projects/spark/graphar/pom.xml
@@ -32,6 +32,7 @@
     </parent>
 
     <artifactId>graphar-commons</artifactId>
+    <version>${graphar.version}</version>
     <packaging>jar</packaging>
 
     <dependencies>
diff --git a/maven-projects/spark/import/neo4j.sh 
b/maven-projects/spark/import/neo4j.sh
index dbae0273..6a3fa09d 100755
--- a/maven-projects/spark/import/neo4j.sh
+++ b/maven-projects/spark/import/neo4j.sh
@@ -20,7 +20,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 conf_path="$(readlink -f $1)"
 
 spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
diff --git a/maven-projects/spark/pom.xml b/maven-projects/spark/pom.xml
index caab96d5..e04ed4ae 100644
--- a/maven-projects/spark/pom.xml
+++ b/maven-projects/spark/pom.xml
@@ -33,6 +33,7 @@
 
     <artifactId>spark</artifactId>
     <packaging>pom</packaging>
+    <version>${graphar.version}</version>
 
     <profiles>
         <profile>
diff --git a/maven-projects/spark/scripts/run-graphar2nebula.sh 
b/maven-projects/spark/scripts/run-graphar2nebula.sh
index 6a3b1ff1..8f772159 100755
--- a/maven-projects/spark/scripts/run-graphar2nebula.sh
+++ b/maven-projects/spark/scripts/run-graphar2nebula.sh
@@ -20,7 +20,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 
 
graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/nebula2graphar/basketballplayergraph.graph.yml}"
 spark-submit --class org.apache.graphar.example.GraphAr2Nebula ${jar_file} \
diff --git a/maven-projects/spark/scripts/run-graphar2neo4j.sh 
b/maven-projects/spark/scripts/run-graphar2neo4j.sh
index d1111aca..11f9caf8 100755
--- a/maven-projects/spark/scripts/run-graphar2neo4j.sh
+++ b/maven-projects/spark/scripts/run-graphar2neo4j.sh
@@ -21,7 +21,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 
 
graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/neo4j2graphar/MovieGraph.graph.yml}"
 spark-submit --class org.apache.graphar.example.GraphAr2Neo4j ${jar_file} \
diff --git a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh 
b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
index 40c07db3..42f55552 100755
--- a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
+++ b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
@@ -21,7 +21,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_0_0.csv"
 
person_knows_person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_knows_person_0_0.csv"
 output_dir="/tmp/graphar/ldbc_sample"
diff --git a/maven-projects/spark/scripts/run-nebula2graphar.sh 
b/maven-projects/spark/scripts/run-nebula2graphar.sh
index cd94381e..f8eb8b7d 100755
--- a/maven-projects/spark/scripts/run-nebula2graphar.sh
+++ b/maven-projects/spark/scripts/run-nebula2graphar.sh
@@ -20,7 +20,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 
 vertex_chunk_size=100
 edge_chunk_size=1024
diff --git a/maven-projects/spark/scripts/run-neo4j2graphar.sh 
b/maven-projects/spark/scripts/run-neo4j2graphar.sh
index 158913ee..90711894 100755
--- a/maven-projects/spark/scripts/run-neo4j2graphar.sh
+++ b/maven-projects/spark/scripts/run-neo4j2graphar.sh
@@ -21,7 +21,7 @@
 set -eu
 
 cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
 
 vertex_chunk_size=100
 edge_chunk_size=1024
diff --git a/pyspark/README.md b/pyspark/README.md
index 1aea4310..8816255c 100644
--- a/pyspark/README.md
+++ b/pyspark/README.md
@@ -1,4 +1,4 @@
-# GraphAr PySpark
+# GraphAr PySpark (under development)
 
 This directory contains the code and build system for the GraphAr PySpark 
library. Library is implemented as bindings to GraphAr Scala Spark library and 
does not contain any real logic.
 
diff --git a/pyspark/graphar_pyspark/__init__.py 
b/pyspark/graphar_pyspark/__init__.py
index c276aeb0..bdca0fcf 100644
--- a/pyspark/graphar_pyspark/__init__.py
+++ b/pyspark/graphar_pyspark/__init__.py
@@ -21,6 +21,7 @@ from pyspark.sql import SparkSession
 
 from graphar_pyspark.errors import GraphArIsNotInitializedError
 
+__version__ = "0.13.0.dev"
 
 class _GraphArSession:
     """Singleton GraphAr helper object, that contains SparkSession and JVM.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-graphar) branch format-definition-dev updated: feat (format): Introduce buf (#519)

Reply via email to