This is an automated email from the ASF dual-hosted git repository.
ssinchenko pushed a commit to branch format-definition-dev
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git
The following commit(s) were added to refs/heads/format-definition-dev by this
push:
new 22aa41fc feat (format): Introduce buf (#519)
22aa41fc is described below
commit 22aa41fcc486b0d8bc702f19e89d849974f2a7fd
Author: Semyon <[email protected]>
AuthorDate: Thu Jun 13 12:02:21 2024 +0200
feat (format): Introduce buf (#519)
* feat(spark): Refactoring datasources (#514)
### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access
private Spark API. Last time when I was trying to fully migrate datasources to
V2 it was a blocker. Detailed motivation is in #493
### What changes are included in this PR?
Mostly refactoring.
### Are these changes tested?
Unit tests are passed
I manually checked the generated JARs:

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.
Close #493
* feat(dev): Add release and verify scripts (#507)
Reason for this PR
Add scripts for developer or release manager to easily release version or
verify a version.
What changes are included in this PR?
Add release and verify scripts
related document is updated to website, see Update the release and verify
document, and add development document incubator-graphar-website#18
Are these changes tested?
yes
Are there any user-facing changes?
no
---------
Signed-off-by: acezen <[email protected]>
* chore: Bump to version v0.12.0 (Round 1) (#517)
Signed-off-by: acezen <[email protected]>
* chore: Add CHANGELOG.md (#513)
Signed-off-by: acezen <[email protected]>
* Introduce buf
- v2
- buf.gen
- buf
On branch format-definition-dev
Your branch is up to date with 'origin/format-definition-dev'.
Changes to be committed:
new file: buf.gen.yaml
new file: buf.yaml
modified: format/adjacent_list.proto
modified: format/edge_info.proto
modified: format/graph_info.proto
modified: format/property_group.proto
modified: format/types.proto
modified: format/vertex_info.proto
---------
Signed-off-by: acezen <[email protected]>
Co-authored-by: Weibin Zeng <[email protected]>
---
.devcontainer/graphar-dev.Dockerfile | 6 +
CHANGELOG.md | 360 +++++++++++++++++++++
CONTRIBUTING.md | 13 +-
LICENSE | 16 +-
NOTICE | 8 +
README.md | 19 +-
buf.gen.yaml | 18 ++
buf.yaml | 3 +
cpp/CMakeLists.txt | 4 +-
cpp/README.md | 4 +-
cpp/test/test_arrow_chunk_reader.cc | 3 +-
.../download_test_data.sh | 27 +-
.../neo4j.sh => dev/release/conda_env_cpp.txt | 15 +-
.../neo4j.sh => dev/release/conda_env_scala.txt | 12 +-
dev/release/release.py | 119 +++++++
dev/release/setup-ubuntu.sh | 52 +++
dev/release/verify.py | 174 ++++++++++
format/adjacent_list.proto | 3 +-
format/edge_info.proto | 3 +-
format/graph_info.proto | 3 +-
format/property_group.proto | 3 +-
format/types.proto | 3 +-
format/vertex_info.proto | 5 +-
licenserc.toml | 2 +
maven-projects/info/pom.xml | 1 +
maven-projects/java/README.md | 2 +-
maven-projects/java/pom.xml | 1 +
maven-projects/pom.xml | 2 +-
maven-projects/spark/README.md | 1 -
.../apache/graphar/datasources/GarDataSource.scala | 16 +-
.../sql/graphar}/GarCommitProtocol.scala | 10 +-
.../sql/graphar}/GarScan.scala | 25 +-
.../sql/graphar}/GarScanBuilder.scala | 8 +-
.../sql/graphar}/GarTable.scala | 14 +-
.../sql/graphar/GarWriteBuilder.scala} | 16 +-
.../sql/graphar/csv/CSVWriteBuilder.scala} | 7 +-
.../sql/graphar}/orc/OrcOutputWriter.scala | 7 +-
.../sql/graphar}/orc/OrcWriteBuilder.scala | 8 +-
.../sql/graphar/parquet/ParquetWriteBuilder.scala} | 11 +-
.../apache/graphar/datasources/GarDataSource.scala | 3 +-
.../sql/graphar}/GarCommitProtocol.scala | 10 +-
.../sql/graphar}/GarScan.scala | 19 +-
.../sql/graphar}/GarScanBuilder.scala | 7 +-
.../sql/graphar}/GarTable.scala | 14 +-
.../sql/graphar/GarWriteBuilder.scala} | 4 +-
.../sql/graphar/csv/CSVWriteBuilder.scala} | 5 +-
.../sql/graphar}/orc/OrcOutputWriter.scala | 2 +-
.../sql/graphar}/orc/OrcWriteBuilder.scala | 4 +-
.../sql/graphar/parquet/ParquetWriteBuilder.scala} | 4 +-
maven-projects/spark/graphar/pom.xml | 1 +
maven-projects/spark/import/neo4j.sh | 2 +-
maven-projects/spark/pom.xml | 1 +
maven-projects/spark/scripts/run-graphar2nebula.sh | 2 +-
maven-projects/spark/scripts/run-graphar2neo4j.sh | 2 +-
.../spark/scripts/run-ldbc-sample2graphar.sh | 2 +-
maven-projects/spark/scripts/run-nebula2graphar.sh | 2 +-
maven-projects/spark/scripts/run-neo4j2graphar.sh | 2 +-
pyspark/README.md | 2 +-
pyspark/graphar_pyspark/__init__.py | 1 +
59 files changed, 910 insertions(+), 183 deletions(-)
diff --git a/.devcontainer/graphar-dev.Dockerfile
b/.devcontainer/graphar-dev.Dockerfile
index 2bdd07a2..1c910d6f 100644
--- a/.devcontainer/graphar-dev.Dockerfile
+++ b/.devcontainer/graphar-dev.Dockerfile
@@ -40,6 +40,12 @@ RUN git clone --branch v1.8.3
https://github.com/google/benchmark.git /tmp/bench
&& make install \
&& rm -rf /tmp/benchmark
+RUN git clone --branch v3.6.0 https://github.com/catchorg/Catch2.git
/tmp/catch2 --depth 1 \
+ && cd /tmp/catch2 \
+ && cmake -Bbuild -H. -DBUILD_TESTING=OFF \
+ && cmake --build build/ --target install \
+ && rm -rf /tmp/catch2
+
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib:/usr/local/lib64
ENV JAVA_HOME=/usr/lib/jvm/default-java
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 00000000..bca929a6
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,360 @@
+# Changelog
+All changes to this project will be documented in this file.
+
+The format is based on [Keep a
Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic
Versioning](https://semver.org/spec/v2.0.0.html).
+
+
+## [v0.11.4] - 2024-03-27
+### Added
+
+- [Minor][Spark] Add SPARK_TESTING variable to increse tests performance by
@SemyonSinchenko in https://github.com/apache/graphar/pull/405
+- Bump up GraphAr version to v0.11.4 by @acezen in
https://github.com/apache/graphar/pull/417
+
+### Changed
+
+- [FEAT][C++] Enhance the validation of writer with arrow::Table's Validate by
@acezen in https://github.com/apache/graphar/pull/410
+- [FEAT][C++] Change the default namespace to `graphar` by @acezen in
https://github.com/apache/graphar/pull/413
+- [FEAT][C++] Not allow setting custom namespace for code clarity by @acezen
in https://github.com/apache/graphar/pull/415
+
+### Docs
+
+- [Feat][Doc] Refactor and update the format specification document by @acezen
in https://github.com/apache/graphar/pull/387
+
+## [v0.11.3] - 2024-03-12
+### Added
+
+- [Feat][Spark] Split datasources and core, prepare for support of multiple
spark versions by @SemyonSinchenko in https://github.com/apache/graphar/pull/369
+- [Feat][Format][Spark] Add nullable key in meta-data by @Thespica in
https://github.com/apache/graphar/pull/365
+- [Feat][Spark] Spark 3.3.x support as a Maven Profile by @SemyonSinchenko in
https://github.com/apache/graphar/pull/376
+- [C++] Include an example of converting SNAP datasets to GraphAr format by
@lixueclaire in https://github.com/apache/graphar/pull/386
+- [Feat][C++] Support `Date` and `Timestamp` data type by @acezen in
https://github.com/apache/graphar/pull/398
+- Bump up GraphAr version to v0.11.3 by @acezen in
https://github.com/apache/graphar/pull/400
+
+### Changed
+
+- [Feat][Spark] Update PySpark bindings following GraphAr Spark by
@SemyonSinchenko in https://github.com/apache/graphar/pull/374
+- [Minor][C++] Revise the unsupported data type error msg to give more
information by @acezen in https://github.com/apache/graphar/pull/391
+
+### Fixed
+
+- [BugFix][C++] Fix bug: PropertyGroup with empty properties make
VertexInfo/EdgeInfo dumps failed by @acezen in
https://github.com/apache/graphar/pull/393
+- [BugFix][C++]: Fix `VertexInfo/EdgeInfo` can not be saved to a URI path by
@acezen in https://github.com/apache/graphar/pull/395
+- [Improvement][C++] Fixes compilation warnings in C++ SDK by @sighingnow in
https://github.com/apache/graphar/pull/388
+
+### Docs
+
+- [Feat][Doc] update Spark documentation by introducing Maven Profiles by
@SemyonSinchenko in https://github.com/apache/graphar/pull/380
+- [Improvement][Doc] Provide an implementation status page to indicate
libraries status of format implementation support by @acezen in
https://github.com/apache/graphar/pull/373
+- [Minor][Doc] Fix the link of the images by @acezen in
https://github.com/apache/graphar/pull/383
+- [Minor][Doc] Update and fix the implementation status page by @lixueclaire
in https://github.com/apache/graphar/pull/385
+- [Feat][Doc] switch to poetry project for docs generating by @SemyonSinchenko
in https://github.com/apache/graphar/pull/384
+
+## [v0.11.2] - 2024-02-24
+### Added
+
+- [Feat][Format][C++] Support nullable key for property in meta-data by
@Thespica in https://github.com/apache/graphar/pull/355
+- [Feat][Format][C++] Support extra info in graph info by @acezen in
https://github.com/apache/graphar/pull/356
+
+### Changed
+- [Improvement][Spark] Try to make neo4j generate DataFrame with the correct
data type by @acezen in https://github.com/apache/graphar/pull/353
+- [Improve][C++] Revise the ArrowChunkReader constructors by remove redundant
parameter by @acezen in https://github.com/apache/graphar/pull/360
+- [Improvement][Doc][CPP] Complement the api reference document of cpp by
@acezen in https://github.com/apache/graphar/pull/364
+- Bump up GraphAr version to v0.11.2 by @acezen in
https://github.com/apache/graphar/pull/371
+
+### Fixed
+
+- [Chore][C++] fix err message by @jasinliu in
https://github.com/apache/graphar/pull/345
+- [BugFix][C++] Update the testing path with latest testing repo by @acezen in
https://github.com/apache/graphar/pull/346
+
+### Docs
+
+- [Doc] Enhance the ReadMe with additional information about the GraphAr
libraries by @lixueclaire in https://github.com/apache/graphar/pull/349
+- [Minor][Doc] Update publication information and fix link in ReadMe by
@lixueclaire in https://github.com/apache/graphar/pull/350
+- [Minor][Doc] Minor fix typo of cpp reference by @acezen in
https://github.com/apache/graphar/pull/363
+
+## [v0.11.1] - 2024-01-24
+### Changed
+
+- [Improvement][Spark] Improve the writer effeciency with parallel process by
@acezen in https://github.com/apache/graphar/pull/329
+- [Feat][Spark] Memory tuning for GraphAr spark with persist and storage level
by @acezen in https://github.com/apache/graphar/pull/326
+- Bump up GraphAr version to v0.11.1 by @acezen in
https://github.com/apache/graphar/pull/342
+
+### Fixed
+
+- [Minor][Spark] Fix typo by @acezen in
https://github.com/apache/graphar/pull/327
+- [Bug][C++] Add implement of property<bool> by @jasinliu in
https://github.com/apache/graphar/pull/337
+- [BugFix][C++] Check is not nullptr before calling ToString and fix empty
prefix bug by @acezen in https://github.com/apache/graphar/pull/339
+
+### Docs
+
+- [Minor][Doc] Update getting-started.rst to fix a typo by @jasinliu in
https://github.com/apache/graphar/pull/325
+- [Minor][Doc] Remove unused community channel and add publication citation by
@acezen in https://github.com/apache/graphar/pull/331
+- [Minor][Doc] Fix README by @acezen in
https://github.com/apache/graphar/pull/332
+- [Minor][Spark] minor doc fix by @acezen in
https://github.com/apache/graphar/pull/336
+
+## [v0.11.0] - 2024-01-15
+### Added
+
+- Bump up GraphAr version to v0.11.0 @acezen
+- [Feat][Spark] Align info implementation of spark with c++ (#316) Weibin Zeng
+- [Feat][Spark] Implementation of PySpark bindings to Scala API (#300) Semyon
+- [Feat][C++] Initialize the micro benchmark for c++ (#299) Weibin Zeng
+- [Improve][Java] Get test resources form environment variables, and remove
all print sentences (#309) John
+- [Feat][Spark] Add Neo4j importer (#243) Liu Jiajun
+- [FEAT][C++] Support `list<string>` data type (#302) Weibin Zeng
+- [Minor][Dev] Update the PR template (#301) Weibin Zeng
+- [Feat][C++] Support List Data Type, use `list<float>` as example (#296)
Weibin Zeng
+- [FEAT][C++] Refactor the C++ SDK with forward declaration and shared ptr
(#290) Weibin Zeng
+- [FEAT][C++] Use `shared_ptr` in all readers and writers (#281) Weibin Zeng
+- [Feat][Java] Fill two incompatible gaps between C++ and Java (#279) John
@Thespica
+
+### Changed
+
+- [Improvement][Spark] Change VertexWriter constructor signature (#314) Semyon
+- [Feat][Spark] Update snakeyaml to 2.x.x version (#312) Semyon
+- [Minor][License] Update the license header and add license check in CI
(#294) Weibin Zeng
+- [Minor][C++] Improve the validation check (#310) Weibin Zeng
+- [Minor][Dev] Update release workflow to make release easy and revise other
workflows (#323) Weibin Zeng
+
+### Fixed
+
+- [Minor][Spark] Fix Spark comparison bug (#318) Zhang Lei
+- [Minor][Doc] Fix spark url in README.m (#317) Zhang Lei
+- [BugFix][Spark] Fix the comparison behavior of
Property/PropertyGroup/AdjList (#306) Weibin Zeng
+- [BugFix][Spark] change maven-site-plugin to 3.7.1 (#305) Weibin Zeng
+- [Minor][Doc] Fix the cpp reference doc (#295) Weibin Zeng
+- [Minor][C++] Fix typo: REGULAR_SEPERATOR -> REGULAR_SEPARATOR (#293) Weibin
Zeng
+- [BugFix][C++] Finalize S3 in FileSystem destructor (#289) Weibin Zeng
+- [Minor][Doc] Fix the typos of document (#282) Weibin Zeng
+- [BugFix][JAVA] Fix invalid option to skip building GraphAr c++ internally
for java (#284) John
+
+### Docs
+
+- [Doc][Improvement] Reorg the document structure by libraries (#292) Weibin
Zeng
+
+## [v0.10.0] - 2023-11-10
+### Added
+
+- [Feat][Spark] Add examples to show how to load/dump data from/to GraphAr for
Nebula (#244) (Liu Xiao) [#244](https://github.com/apache/graphar/pull/244)
+- [Minor][Spark] Support get GraphAr Spark from Maven (#250) (Weibin Zeng)
[#250](https://github.com/apache/graphar/pull/250)
+- [Improvement][C++] Use inherit to implement EdgesCollection (#238) (Weibin
Zeng) [#238](https://github.com/apache/graphar/pull/238)
+- [C++] Add examples about how to use C++ reader/writer (#252) (lixueclaire)
[#252](https://github.com/apache/graphar/pull/252)
+- [Improve][C++] Use arrow shared library if arrow installed (#263) (Weibin
Zeng) [#263](https://github.com/apache/graphar/pull/263)
+- [Improve][Java] Make EdgesCollection and VerticesCollection support foreach
loop (#270) (John) [#270](https://github.com/apache/graphar/pull/270)
+- [Minor][CI] Install certain version of arrow in CI to avoid breaking down CI
when arrow upgrade (#273) (Weibin Zeng)
[#273](https://github.com/apache/graphar/pull/273)
+- [Improvement][Spark] Complement the error messages of spark SDK (#278)
(Weibin Zeng) [#278](https://github.com/apache/graphar/pull/278)
+- [Feat][Format] Add internal id column to vertex payload file (#264) (Weibin
Zeng) [#264](https://github.com/apache/graphar/pull/264)
+
+### Changed
+
+- [Minor][C++] Update the C++ SDK version config (#266) (Weibin Zeng)
[#266](https://github.com/apache/graphar/pull/266)
+- [Doc][BugFix] Fix missing of scaladoc and javadoc in website (#269) (John)
[#269](https://github.com/apache/graphar/pull/269)
+
+### Fixed
+
+- [BUG][C++] Fix testing data path of examples (#251) (lixueclaire)
[#251](https://github.com/apache/graphar/pull/251)
+- [BugFix][Spark] Close the FileSystem Object (haohao0103)
[#258](https://github.com/apache/graphar/pull/258)
+- [BugFix][JAVA] Fix the building order bug of JAVA SDK (#261) (Weibin Zeng)
[#261](https://github.com/apache/graphar/pull/261)
+
+### Docs
+
+- [Minor][Doc]Add release-process.md to explain the release process, as
supplement of road map (#254) (Weibin Zeng)
[#254](https://github.com/apache/graphar/pull/254)
+- [Doc][Spark] Update the doc: fix the outdated argument annotations and typo
(#267) (Weibin Zeng) [#267](https://github.com/apache/graphar/pull/267)
+- [Doc] Provide Java's reference library, documentation for users and
developers (#242) (John) [#242](https://github.com/apache/graphar/pull/242)
+
+
+## [v0.9.0] - 2023-10-08
+### Added
+
+- Define code style for spark and java and add code format check to CI (#232)
(Weibin Zeng) [#232](https://github.com/apache/graphar/pull/232)
+- [FEAT][JAVA] Implement READERS and WRITERS for Java (#233) (John)
[#233](https://github.com/apache/graphar/pull/233)
+- [Spark] Support property filter pushdown by utilizing payload file formats
(#221) (Ziyi Tan) [#221](https://github.com/apache/graphar/pull/221)
+
+## [v0.8.0] - 2023-08-30
+### Added
+
+- [Minor][Spark] Adapt spark yaml format to BLOCK (#217) (Weibin Zeng)
[#217](https://github.com/apache/graphar/pull/217)
+- [Feat][C++] Output the error message when access value in Result fail (#222)
(Weibin Zeng) [#222](https://github.com/apache/graphar/pull/222)
+- [Feat][Java] Initialize the JAVA SDK: add INFO implementation (#212) (John)
[#212](https://github.com/apache/graphar/pull/212)
+- [Feat][C++] Support building GraphAr with system installed arrow (#230)
(Weibin Zeng) [#230](https://github.com/apache/graphar/pull/230)
+
+### Changed
+
+- [FEAT] Unify the name:`utils` -> `util` and the namespace of `GraphAr::util`
(#225) (Weibin Zeng) [#225](https://github.com/apache/graphar/pull/225)
+
+### Fixed
+
+- [Minor] Fix the broken CI of doc (#214) (Weibin Zeng)
[#214](https://github.com/apache/graphar/pull/214)
+- [BugFix][Spark] Fix compile error under JDK8 and maven 3.9.x (#216) (Liu
Xiao) [#216](https://github.com/apache/graphar/pull/216)
+- [BugFix][C++] Remove arrow header from GraphAr's header (#229) (Weibin Zeng)
[#229](https://github.com/apache/graphar/pull/229)
+
+## [v0.7.0] - 2023-07-24
+### Added
+
+- [C++] Support property filter pushdown by utilizing payload file formats
(#178) (Ziyi Tan) [#178](https://github.com/apache/graphar/pull/178)
+
+### Changed
+
+- [C++][Improvement] Redesign and unify the implementation of validation in
C++ Writer/Builder (#186) (lixueclaire)
[#186](https://github.com/apache/graphar/pull/186)
+- [Improvement][C++] Refine the error message of errors of C++ SDK (#192)
(Weibin Zeng) [#192](https://github.com/apache/graphar/pull/192)
+- [Improvement][C++] Refine the error message of Reader SDK (#195) (Ziyi Tan)
[#195](https://github.com/apache/graphar/pull/195)
+- Update the favicon image (#199) (Weibin Zeng)
[#199](https://github.com/apache/graphar/pull/199)
+- Update doc comments in graph_info.h (#204) (John)
[#204](https://github.com/apache/graphar/pull/204)
+- [Spark] Refine the `GraphWriter` to automatically generate graph info and
improve the Neo4j case (#196) (Weibin Zeng)
[#196](https://github.com/apache/graphar/pull/196)
+
+### Fixed
+
+- Fixes the pull_request_target usage to avoid the secret leak issue. (#193)
(Tao He) [#193](https://github.com/apache/graphar/pull/193)
+- Fixes the link to the logo image in README (#198) (Tao He)
[#198](https://github.com/apache/graphar/pull/198)
+- [Minor][C++] Fix grammar mistakes. (#208) (John)
[#208](https://github.com/apache/graphar/pull/208)
+
+### Docs
+
+- [Minor][Doc] Add GraphAr logo to README (#197) (Weibin Zeng)
[#197](https://github.com/apache/graphar/pull/197)
+- [Spark][Doc]Add java version for neo4j example. (#207) (Liu Jiajun)
[#207](https://github.com/apache/graphar/pull/207)
+
+## [v0.6.0] - 2023-06-09
+### Added
+
+- [C++] Support to get reference of the property in Vertex/Edge (#156)
(lixueclaire) [#156](https://github.com/apache/graphar/pull/156)
+- [C++] Align arrow version to system if arrow installed (#162) (@acezen
Weibin Zeng) [#162](https://github.com/apache/graphar/pull/162)
+- [BugFix] [C++] Make examples to generate result files under build type of
release (#173) (lixueclaire) [#173](https://github.com/apache/graphar/pull/173)
+- [Improvement][C++] Use recommended parameter to sort in Writer (#177)
(@lixueclaire lixueclaire) [#177](https://github.com/apache/graphar/pull/177)
+- [C++][Improvement] Add validation of different levels for builders in C++
library (#181) (lixueclaire) [#181](https://github.com/apache/graphar/pull/181)
+
+### Changed
+
+### Fixed
+
+- Fix compile error on ARM platform (#158) (Weibin Zeng)
[#158](https://github.com/apache/graphar/pull/158)
+- [C++][BugFix] Fix the arrow acero not found error when building with arrow
12.0.0 or greater (#164) (Weibin Zeng)
[#164](https://github.com/apache/graphar/pull/164)
+
+### Docs
+
+- [Doc] Refine the documentation of file format design (#165) (lixueclaire)
[#165](https://github.com/apache/graphar/pull/165)
+- [Doc] Improve spelling (#175) (Ziyi Tan)
[#175](https://github.com/apache/graphar/pull/175)
+- [MINOR][DOC] Add mail list to our communication tools and add community
introduction (#179) (Weibin Zeng)
[#179](https://github.com/apache/graphar/pull/179)
+- [Doc]Refine README in cpp about building (#182) (John)
[#182](https://github.com/apache/graphar/pull/182)
+
+## [v0.5.0] - 2023-05-12
+### Added
+
+- Enable arrow S3 support to support reading and writing file with S3/OSS
(#125) (Weibin Zeng) [#125](https://github.com/apache/graphar/pull/125)
+- [Improvement][C++] Add validation for data types for writers in C++ library
(#136) (lixueclaire) [#136](https://github.com/apache/graphar/pull/136)
+- [C++] Add vertex_count file for storing edges in GraphAr (#138)
(lixueclaire) [#138](https://github.com/apache/graphar/pull/138)
+- [FEAT] Use single header yaml parser `mini-yaml` (#142) (Weibin Zeng)
[#142](https://github.com/apache/graphar/pull/142)
+- Implement the add-assign operator for VertexIter (#151) (lixueclaire)
[#151](https://github.com/apache/graphar/pull/151)
+
+### Changed
+
+- [Improvement][C++] Improve the usability of EdgesCollection (#133)
(lixueclaire) [#133](https://github.com/apache/graphar/pull/133)
+- [Minor] Update README: add information about weekly meeting (#139) (Weibin
Zeng) [#139](https://github.com/apache/graphar/pull/139)
+- [Minor] Make the curl interface private (#146) (Weibin Zeng)
[#146](https://github.com/apache/graphar/pull/146)
+- [Doc] Update the images of README (#145) (Weibin Zeng)
[#145](https://github.com/apache/graphar/pull/145)
+- [Spark] Update the Spark library to align with the latest file format design
(#144) (lixueclaire) [#144](https://github.com/apache/graphar/pull/144)
+- [Minor][Doc]Remove deleted methods from API Reference (#149) (lixueclaire)
[#149](https://github.com/apache/graphar/pull/149)
+- [Doc] Refine building steps to be more clear in ReadMe (#154) (lixueclaire)
[#154](https://github.com/apache/graphar/pull/154)
+
+### Fixed
+
+- [BugFix][C++] Fix next_chunk() of readers in the C++ library (#137)
(lixueclaire) [#137](https://github.com/apache/graphar/pull/137)
+- [Minor] HotFix the link error of libcurl when building test (#147) (Weibin
Zeng) [#147](https://github.com/apache/graphar/pull/147)
+- [Minor] Fix the overview image (#148) (Weibin Zeng)
[#148](https://github.com/apache/graphar/pull/148)
+- [Minor] Fix building arrow bug on centos8 (#150) (Weibin Zeng)
[#150](https://github.com/apache/graphar/pull/150)
+
+## [v0.4.0] - 2023-04-13
+### Added
+
+- [Minor] Add discord invite link and banner to README (#129) (@acezen Weibin
Zeng) [#129](https://github.com/apache/graphar/pull/129)
+- [Improvement][C++] Implement the add operator for VertexIter (#128)
(@lixueclaire lixueclaire) [#128](https://github.com/apache/graphar/pull/128)
+- [C++] Add edge count file in GraphAr (#132) (lixueclaire)
[#132](https://github.com/apache/graphar/pull/132)
+
+### Changed
+
+- Disable jemalloc when building the bundled arrow (#122) (@sighingnow Tao He)
[#122](https://github.com/apache/graphar/pull/122)
+- [Minor][C++] Adjust the dependency version of arrow and fix arrow header
conflict bug (#134) (Weibin Zeng)
[#134](https://github.com/apache/graphar/pull/134)
+- [Minor] Update testing data (#135) (Weibin Zeng)
[#135](https://github.com/apache/graphar/pull/135)
+
+### Fixed
+
+- [Minor][C++] Fix compile warning (#123) (Yee)
[#123](https://github.com/apache/graphar/pull/123)
+- Fix test data path for examples (#131) (lixueclaire)
[#131](https://github.com/apache/graphar/pull/131)
+
+## [v0.3.0] - 2023-03-10
+### Added
+
+- [Improvement][Spark] Add helper objects and methods for loading info classes
from files (#112) (lixueclaire)
[#112](https://github.com/apache/graphar/pull/112)
+- [Improvement][Spark] Provide APIs for data transformation at the graph level
(#113) (lixueclaire) [#113](https://github.com/apache/graphar/pull/113)
+- [Improvement][Spark] Provide APIs for data reading and writing at the graph
level (#114) (Weibin Zeng) [#114](https://github.com/apache/graphar/pull/114)
+- [Examples][Spark] Add examples of integrating with the Neo4j spark connector
as an application of GraphAr (#107) (lixueclaire)
[#107](https://github.com/apache/graphar/pull/107)
+
+### Changed
+
+- Refine the overview figure and fix the typos in documentation (#117)
(lixueclaire) [#117](https://github.com/apache/graphar/pull/117)
+- [Improvement][DevInfra] Reorg the code directory to easily to extend
libraries (#116) (Weibin Zeng)
[#116](https://github.com/apache/graphar/pull/116)
+- [Minor][Doc] Remove the invalid link (#121) (Weibin Zeng)
[#121](https://github.com/apache/graphar/pull/121)
+
+### Fixed
+
+- [BugFix][Spark] Fix the bug that VertexWrite does not generate vertex count
file (#110) (Weibin Zeng) [#110](https://github.com/apache/graphar/pull/110)
+
+## [v0.2.0] - 2023-02-23
+### Added
+
+- [Improvement] [Spark] Add methods for Spark Reader and improve the
performance (#87) (lixueclaire) [#87](https://github.com/apache/graphar/pull/87)
+- Add pre-commit configuration and instructions (#93) (Tao He)
[#93](https://github.com/apache/graphar/pull/93)
+- Handle comments correctly for preview PR docs (#94) (Tao He)
[#94](https://github.com/apache/graphar/pull/94)
+- [Improve] Add auxiliary functions to get vertex chunk num or edge chunk num
with infos (#95) (Weibin Zeng) [#95](https://github.com/apache/graphar/pull/95)
+- [Improve] Use gar-related names for arrow project and ccache to avoid
duplicated project name (#102) (Weibin Zeng)
[#102](https://github.com/apache/graphar/pull/102)
+- Add prefix to arrow definitions to avoid conflicts (#106) (Tao He)
[#106](https://github.com/apache/graphar/pull/106)
+
+### Changed
+
+- [Improve][Spark] Improve the performance of GraphAr Spark Reader (#84)
(lixueclaire) [#84](https://github.com/apache/graphar/pull/84)
+- Cast StringArray to LargeStringArray otherwise we will fill when we need to
contenate chunks (#105) (Tao He)
[#105](https://github.com/apache/graphar/pull/105)
+- [Improvement] Improve GraphAr spark writer performance and implement custom
writer builder to bypass spark's write behavior (#92) (Weibin Zeng)
[#92](https://github.com/apache/graphar/pull/92)
+- [Improvement][FileFormat] Write CSV payload files with header (#85) (Weibin
Zeng) [#85](https://github.com/apache/graphar/pull/85)
+- Update the source code url of GraphScope fragment builder and writer (#103)
(Weibin Zeng) [#103](https://github.com/apache/graphar/pull/103)
+
+### Fixed
+
+- [BugFix] Fix the Spark Writer bug when the column name contains a dot(.)
(#101) (lixueclaire) [#101](https://github.com/apache/graphar/pull/101)
+- It should be linker flags, suppressing the clang warnings (#104) (Tao He)
[#104](https://github.com/apache/graphar/pull/104)
+- Address issues in handling yaml-cpp correctly when requires GraphAr in
external projects (#91) (Tao He)
[#91](https://github.com/apache/graphar/pull/91)
+
+## [v0.1.0] - 2023-01-11
+### Added
+- Add ccache to github actions by @acezen in
https://github.com/apache/incubator-graphar/pull/12
+- Add issue template and pull request template to help user easy to get… by
@acezen in https://github.com/apache/incubator-graphar/pull/13
+- Add CODE_OF_CONDUCT.md by @acezen in
https://github.com/apache/incubator-graphar/pull/26
+- Add InfoVersion to store version information of info and support data type
extension base on info version by @acezen in
https://github.com/apache/incubator-graphar/pull/27
+- Initialize the spark tool of GraphAr and implement the Info and
IndexGenerator by @acezen in
https://github.com/apache/incubator-graphar/pull/45
+- organize an example pagerank app employing the gar library (#44) by
@andydiwenzhu in https://github.com/apache/incubator-graphar/pull/46
+- Initialize the implementation of spark writer by @acezen in
https://github.com/apache/incubator-graphar/pull/51
+- Initialize implementation for spark reader by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/52
+- Add release and reviewing tutorial to contributing guide by @acezen in
https://github.com/apache/incubator-graphar/pull/53
+- Add introduction about GraphAr Spark tools in document by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/58
+- Add spark tool api reference to doc by @acezen in
https://github.com/apache/incubator-graphar/pull/59
+- Add Spark application examples using GraphAr Spark tools by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/61
+### Changed
+
+- Use the apache URL to download apache-arrow. by @sighingnow in
https://github.com/apache/incubator-graphar/pull/7
+- Update gar-test submodule url by @acezen in
https://github.com/apache/incubator-graphar/pull/6
+- Update README.rst by @yecol in
https://github.com/apache/incubator-graphar/pull/11
+- Revise image links in docs by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/10
+- Refine documentation about integrating into GraphScope by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/15
+- Refine the contributing doc to more readable and easy to get started by
@acezen in https://github.com/apache/incubator-graphar/pull/16
+- [Minor] Remove `docutils` version limit to fix docs ci by @acezen in
https://github.com/apache/incubator-graphar/pull/57
+- Remove `include "arrow/api.h" from graph.h by @acezen in
https://github.com/apache/incubator-graphar/pull/50
+- [Improve][Doc] Revise the README and APIs docstring of GraphAr by @acezen in
https://github.com/apache/incubator-graphar/pull/64
+- [Improve][Doc] Refine the documentation about user guide and applications by
@lixueclaire in https://github.com/apache/incubator-graphar/pull/69
+
+### Fixed
+
+- Fix the inconsistent prefix for vertex property chunks and update image
links by @acezen in https://github.com/apache/incubator-graphar/pull/4
+- Fix the file suffix of bug report template by @acezen in
https://github.com/apache/incubator-graphar/pull/17
+- Fix prefix of GAR files in document by @lixueclaire in
https://github.com/apache/incubator-graphar/pull/56
+- [BugFix][Spark] Fix offset chunk output path and offset value of spark
writer by @acezen in https://github.com/apache/incubator-graphar/pull/63
+- [MinorFix] Remove unnecessary file by @acezen in
https://github.com/apache/incubator-graphar/pull/43
+- [BugFix] Hide the interface of dependencies of GraphAr with `PRIVATE` link
type by @acezen in https://github.com/apache/incubator-graphar/pull/71
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 02b8f18e..3e35b4bc 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -97,8 +97,7 @@ For small or first-time contributions, we recommend the dev
container method. An
### Using a dev container environment
GraphAr provides a pre-configured [dev container](https://containers.dev/)
-that could be used in [GitHub
Codespaces](https://github.com/features/codespaces),
-[VSCode](https://code.visualstudio.com/docs/devcontainers/containers),
[JetBrains](https://www.jetbrains.com/remote-development/gateway/),
+that could be used in
[VSCode](https://code.visualstudio.com/docs/devcontainers/containers),
[JetBrains](https://www.jetbrains.com/remote-development/gateway/),
[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
Please pick up your favorite runtime environment.
@@ -107,6 +106,10 @@ Please pick up your favorite runtime environment.
Different components of GraphAr may require different setup steps. Please
refer to their respective `README` documentation for more details.
- [C++ Library](cpp/README.md)
-- [Java Library](java/README.md)
-- [Spark Library](spark/README.md)
-- [PySpark Library](pyspark/README.md)
+- [Scala with Spark Library](spark/README.md)
+- [Python with PySpark Library](pyspark/README.md) (under development)
+- [Java Library](java/README.md) (under development)
+
+----
+
+This doc refer from [Apache OpenDAL](https://opendal.apache.org/)
diff --git a/LICENSE b/LICENSE
index d1d8cf77..b9617232 100644
--- a/LICENSE
+++ b/LICENSE
@@ -212,7 +212,7 @@ Apache-2.0 licenses
The following components are provided under the Apache-2.0 License. See
project link for details.
The text of each license is the standard Apache 2.0 license.
-* spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
+* Apache Spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
Files:
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -234,9 +234,13 @@ The text of each license is the standard Apache 2.0
license.
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCOutputWriter.scala
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCWriteBuilder.scala
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriteBuilder.scala
- are modified from spark.
+ are modified from Apache Spark.
+
+* Apache Arrow 12.0.0 (https://github.com/apache/arrow)
+ Files:
+ dev/release/setup-ubuntu.sh
+ are modified from Apache Arrow.
-* arrow 12.0.0 (https://github.com/apache/arrow)
* fastFFI v0.1.2 (https://github.com/alibaba/fastFFI)
Files:
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdString.java
@@ -251,6 +255,12 @@ The text of each license is the standard Apache 2.0
license.
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdUnorderedMap.java
are modified from GraphScope.
+* Apache OpenDAL v0.45.1 (https://github.com/apache/opendal)
+ Files:
+ dev/release/release.py
+ dev/release/verify.py
+ are modified from OpenDAL.
+
================================================================
MIT licenses
================================================================
diff --git a/NOTICE b/NOTICE
index cb5fbb1b..4dc3200b 100644
--- a/NOTICE
+++ b/NOTICE
@@ -31,3 +31,11 @@ which includes the following in its NOTICE file:
fastFFI
Copyright 1999-2021 Alibaba Group Holding Ltd.
+
+--------------------------------------------------------------------------------
+
+This product includes code from Apache OpenDAL, which includes the following in
+its NOTICE file:
+
+ Apache OpenDAL
+ Copyright 2022 and onwards The Apache Software Foundation.
diff --git a/README.md b/README.md
index ad9e064b..af5fc3f6 100644
--- a/README.md
+++ b/README.md
@@ -207,8 +207,17 @@ See [GraphAr C++
Library](./cpp) for
details about the building of the C++ library.
+
+### The Scala with Spark Library
+
+See [GraphAr Spark
+Library](./maven-projects/spark)
+for details about the Scala with Spark library.
+
### The Java Library
+The Java library is under development.
+
The GraphAr Java library is created with bindings to the C++ library
(currently at version v0.10.0), utilizing
[Alibaba-FastFFI](https://github.com/alibaba/fastFFI) for
@@ -216,15 +225,11 @@ implementation. See [GraphAr Java
Library](./maven-projects/java) for
details about the building of the Java library.
-### The Spark Library
-
-See [GraphAr Spark
-Library](./maven-projects/spark)
-for details about the Spark library.
+### The Python with PySpark Library
-### The PySpark Library
+The Python with PySpark library is under development.
-The GraphAr PySpark library is developed as bindings to the GraphAr
+The PySpark library is developed as bindings to the GraphAr
Spark library. See [GraphAr PySpark
Library](./pyspark)
for details about the PySpark library.
diff --git a/buf.gen.yaml b/buf.gen.yaml
new file mode 100644
index 00000000..6efdffa7
--- /dev/null
+++ b/buf.gen.yaml
@@ -0,0 +1,18 @@
+version: v2
+managed:
+ enabled: true
+ disable:
+ - file_option: java_package
+plugins:
+ # Python classes
+ - remote: buf.build/protocolbuffers/python:v27.1
+ out: pyspark/graphar_pyspark/proto/
+ # Python headers for IDEs and MyPy
+ - remote: buf.build/protocolbuffers/pyi
+ out: pyspark/graphar_pyspark/proto/
+ # Cpp
+ - remote: buf.build/protocolbuffers/cpp:v27.1
+ out: cpp/src/proto
+ # Java
+ - remote: buf.build/protocolbuffers/java:v27.1
+ out: maven-projects/info/src/main/java/
diff --git a/buf.yaml b/buf.yaml
new file mode 100644
index 00000000..bda430e8
--- /dev/null
+++ b/buf.yaml
@@ -0,0 +1,3 @@
+version: v2
+modules:
+ - path: format
diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index fe81d18f..45a14c4d 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -32,8 +32,8 @@ if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
endif()
set(GRAPHAR_MAJOR_VERSION 0)
-set(GRAPHAR_MINOR_VERSION 11)
-set(GRAPHAR_PATCH_VERSION 4)
+set(GRAPHAR_MINOR_VERSION 12)
+set(GRAPHAR_PATCH_VERSION 0)
set(GREAPHAR_VERSION
${GRAPHAR_MAJOR_VERSION}.${GRAPHAR_MINOR_VERSION}.${GRAPHAR_PATCH_VERSION})
project(graphar-cpp LANGUAGES C CXX VERSION ${GREAPHAR_VERSION})
diff --git a/cpp/README.md b/cpp/README.md
index a2891026..743f0476 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -67,9 +67,7 @@ repository and navigated to the ``cpp`` subdirectory with:
```bash
$ git clone https://github.com/apache/graphar.git
- $ cd graphar
- $ git submodule update --init
- $ cd cpp
+ $ cd graphar/cpp
```
Release build:
diff --git a/cpp/test/test_arrow_chunk_reader.cc
b/cpp/test/test_arrow_chunk_reader.cc
index 10e718ba..74d8041d 100644
--- a/cpp/test/test_arrow_chunk_reader.cc
+++ b/cpp/test/test_arrow_chunk_reader.cc
@@ -158,8 +158,7 @@ TEST_CASE_METHOD(GlobalFixture, "ArrowChunkReader") {
<< '\n';
std::cout << "Column Nums: " << table->num_columns() << "\n";
std::cout << "Column Names: ";
- for (int i = 0;
- i < table->num_columns() && i < expected_cols.size(); i++) {
+ for (int i = 0; i < table->num_columns(); i++) {
REQUIRE(table->ColumnNames()[i] == expected_cols[i]);
std::cout << "`" << table->ColumnNames()[i] << "` ";
}
diff --git a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
b/dev/download_test_data.sh
similarity index 54%
copy from maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
copy to dev/download_test_data.sh
index 40c07db3..83555be3 100755
--- a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
+++ b/dev/download_test_data.sh
@@ -1,5 +1,5 @@
#!/bin/bash
-
+#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
@@ -17,17 +17,16 @@
# specific language governing permissions and limitations
# under the License.
+# A script to download test data for GraphAr
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_0_0.csv"
-person_knows_person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_knows_person_0_0.csv"
-output_dir="/tmp/graphar/ldbc_sample"
-
-vertex_chunk_size=100
-edge_chunk_size=1024
-file_type="parquet"
-spark-submit --class org.apache.graphar.example.LdbcSample2GraphAr ${jar_file}
\
- ${person_input_file} ${person_knows_person_input_file} ${output_dir}
${vertex_chunk_size} ${edge_chunk_size} ${file_type}
+if [ -n "${GAR_TEST_DATA}" ]; then
+ if [[ ! -d "$GAR_TEST_DATA" ]]; then
+ echo "GAR_TEST_DATA is set but the directory does not exist, cloning
the test data to $GAR_TEST_DATA"
+ git clone https://github.com/apache/incubator-graphar-testing.git
"$GAR_TEST_DATA" --depth 1 || true
+ fi
+else
+ echo "GAR_TEST_DATA is not set, cloning the test data to
/tmp/graphar-testing"
+ git clone https://github.com/apache/incubator-graphar-testing.git
/tmp/graphar-testing --depth 1 || true
+ echo "Test data has been cloned to /tmp/graphar-testing, please run"
+ echo " export GAR_TEST_DATA=/tmp/graphar-testing"
+fi
diff --git a/maven-projects/spark/import/neo4j.sh
b/dev/release/conda_env_cpp.txt
old mode 100755
new mode 100644
similarity index 72%
copy from maven-projects/spark/import/neo4j.sh
copy to dev/release/conda_env_cpp.txt
index dbae0273..c0025b04
--- a/maven-projects/spark/import/neo4j.sh
+++ b/dev/release/conda_env_cpp.txt
@@ -1,4 +1,3 @@
-#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
@@ -16,12 +15,8 @@
# specific language governing permissions and limitations
# under the License.
-
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-conf_path="$(readlink -f $1)"
-
-spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
- ${conf_path}
+cmake
+conda-forge::arrow-cpp=13.0.0
+make
+clangxx_linux-64
+conda-forge::catch2=3.6.0
diff --git a/maven-projects/spark/import/neo4j.sh
b/dev/release/conda_env_scala.txt
old mode 100755
new mode 100644
similarity index 72%
copy from maven-projects/spark/import/neo4j.sh
copy to dev/release/conda_env_scala.txt
index dbae0273..c63df3f9
--- a/maven-projects/spark/import/neo4j.sh
+++ b/dev/release/conda_env_scala.txt
@@ -1,4 +1,3 @@
-#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
@@ -16,12 +15,5 @@
# specific language governing permissions and limitations
# under the License.
-
-set -eu
-
-cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
-conf_path="$(readlink -f $1)"
-
-spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
- ${conf_path}
+maven
+openjdk=11.0.13
\ No newline at end of file
diff --git a/dev/release/release.py b/dev/release/release.py
new file mode 100644
index 00000000..366ddeab
--- /dev/null
+++ b/dev/release/release.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache OpenDAL v0.45.1
+# https://github.com/apache/opendal/blob/5079125/scripts/release.py
+
+import re
+import subprocess
+from pathlib import Path
+
+ROOT_DIR = Path(__file__).parent.parent.parent
+
+def get_package_version():
+ major_version = None
+ minor_version = None
+ patch_version = None
+ major_pattern =
re.compile(r'set\s*\(\s*GRAPHAR_MAJOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+ minor_pattern =
re.compile(r'set\s*\(\s*GRAPHAR_MINOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+ patch_pattern =
re.compile(r'set\s*\(\s*GRAPHAR_PATCH_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
+
+ file_path = ROOT_DIR / "cpp/CMakeLists.txt"
+ with open(file_path, 'r') as file:
+ for line in file:
+ major_match = major_pattern.search(line)
+ minor_match = minor_pattern.search(line)
+ patch_match = patch_pattern.search(line)
+
+ if major_match:
+ major_version = major_match.group(1)
+ if minor_match:
+ minor_version = minor_match.group(1)
+ if patch_match:
+ patch_version = patch_match.group(1)
+
+ if major_version and minor_version and patch_version:
+ return f"{major_version}.{minor_version}.{patch_version}"
+ else:
+ return None
+
+def archive_source_package():
+ print(f"Archive source package started")
+
+ version = get_package_version()
+ assert version, "Failed to get the package version"
+ name = f"apache-graphar-{version}-incubating-src"
+
+ archive_command = [
+ "git",
+ "archive",
+ "--prefix",
+ f"apache-graphar-{version}-incubating-src/",
+ "-o",
+ f"{ROOT_DIR}/dist/{name}.tar.gz",
+ "HEAD",
+ ]
+ subprocess.run(
+ archive_command,
+ cwd=ROOT_DIR,
+ check=True,
+ )
+
+ print(f"Archive source package to dist/{name}.tar.gz")
+
+
+def generate_signature():
+ for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+ print(f"Generate signature for {i}")
+ subprocess.run(
+ ["gpg", "--yes", "--armor", "--output", f"{i}.asc",
"--detach-sig", str(i)],
+ cwd=ROOT_DIR / "dist",
+ check=True,
+ )
+
+ for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+ print(f"Check signature for {i}")
+ subprocess.run(
+ ["gpg", "--verify", f"{i}.asc", str(i)], cwd=ROOT_DIR / "dist",
check=True
+ )
+
+
+def generate_checksum():
+ for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+ print(f"Generate checksum for {i}")
+ subprocess.run(
+ ["sha512sum", str(i.relative_to(ROOT_DIR / "dist"))],
+ stdout=open(f"{i}.sha512", "w"),
+ cwd=ROOT_DIR / "dist",
+ check=True,
+ )
+
+ for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
+ print(f"Check checksum for {i}")
+ subprocess.run(
+ ["sha512sum", "--check", f"{str(i.relative_to(ROOT_DIR /
'dist'))}.sha512"],
+ cwd=ROOT_DIR / "dist",
+ check=True,
+ )
+
+
+if __name__ == "__main__":
+ (ROOT_DIR / "dist").mkdir(exist_ok=True)
+ archive_source_package()
+ generate_signature()
+ generate_checksum()
diff --git a/dev/release/setup-ubuntu.sh b/dev/release/setup-ubuntu.sh
new file mode 100644
index 00000000..6e74b3fc
--- /dev/null
+++ b/dev/release/setup-ubuntu.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache Arrow 12.0.0
+# https://github.com/apache/arrow/blob/9736dde/dev/release/setup-ubuntu.sh
+
+# A script to install dependencies required for release
+# verification on Ubuntu.
+
+set -exu
+
+codename=$(. /etc/os-release && echo ${UBUNTU_CODENAME})
+id=$(. /etc/os-release && echo ${ID})
+
+apt-get install -y -q --no-install-recommends \
+ build-essential \
+ cmake \
+ git \
+ gnupg \
+ libcurl4-openssl-dev \
+ maven \
+ openjdk-11-jdk \
+ wget \
+ pkg-config \
+ tzdata \
+ subversion
+
+wget -c
https://apache.jfrog.io/artifactory/arrow/${id}/apache-arrow-apt-source-latest-${codename}.deb
\
+ -P /tmp/
+apt-get install -y -q /tmp/apache-arrow-apt-source-latest-${codename}.deb
+apt-get update -y -q
+apt-get install -y -q --no-install-recommends \
+ libarrow-dev \
+ libarrow-dataset-dev \
+ libarrow-acero-dev \
+ libparquet-dev
diff --git a/dev/release/verify.py b/dev/release/verify.py
new file mode 100644
index 00000000..fe9c46ce
--- /dev/null
+++ b/dev/release/verify.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Derived from Apache OpenDAL v0.45.1
+# https://github.com/apache/opendal/blob//5079125/scripts/verify.py
+
+import subprocess
+import os
+from pathlib import Path
+
+BASE_DIR = Path(os.getcwd())
+
+# Define colors for output
+YELLOW = "\033[37;1m"
+GREEN = "\033[32;1m"
+ENDCOLOR = "\033[0m"
+
+
+def check_signature(pkg):
+ """Check the GPG signature of the package."""
+ try:
+ subprocess.check_call(["gpg", "--verify", f"{pkg}.asc", pkg])
+ print(f"{GREEN}> Success to verify the gpg sign for {pkg}{ENDCOLOR}")
+ except subprocess.CalledProcessError:
+ print(f"{YELLOW}> Failed to verify the gpg sign for {pkg}{ENDCOLOR}")
+
+
+def check_sha512sum(pkg):
+ """Check the sha512 checksum of the package."""
+ try:
+ subprocess.check_call(["sha512sum", "--check", f"{pkg}.sha512"])
+ print(f"{GREEN}> Success to verify the checksum for {pkg}{ENDCOLOR}")
+ except subprocess.CalledProcessError:
+ print(f"{YELLOW}> Failed to verify the checksum for {pkg}{ENDCOLOR}")
+
+
+def extract_packages():
+ for file in BASE_DIR.glob("*.tar.gz"):
+ subprocess.run(["tar", "-xzf", file], check=True)
+
+
+def check_license(dir):
+ print(f"> Start checking LICENSE file in {dir}")
+ if not (dir / "LICENSE").exists():
+ raise f"{YELLOW}> LICENSE file is not found{ENDCOLOR}"
+ print(f"{GREEN}> LICENSE file exists in {dir}{ENDCOLOR}")
+
+
+def check_notice(dir):
+ print(f"> Start checking NOTICE file in {dir}")
+ if not (dir / "NOTICE").exists():
+ raise f"{YELLOW}> NOTICE file is not found{ENDCOLOR}"
+ print(f"{GREEN}> NOTICE file exists in {dir}{ENDCOLOR}")
+
+
+def install_conda():
+ print("Start installing conda")
+ subprocess.run(["wget",
"https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"],
check=True)
+ subprocess.run(["bash", "Miniconda3-latest-Linux-x86_64.sh", "-b"],
check=True)
+ print(f"{GREEN}Success to install conda{ENDCOLOR}")
+
+
+def maybe_setup_conda(dependencies):
+ # Optionally setup a conda environment with the given dependencies
+ if ("USE_CONDA" in os.environ) and (os.environ["USE_CONDA"] > 0):
+ print("Configuring conda environment...")
+ subprocess.run(["conda", "deactivate"], check=False,
stderr=subprocess.STDOUT)
+ create_env_command = ["conda", "create", "--name", "graphar", "--yes",
"python=3.8"]
+ subprocess.run(create_env_command, check=True,
stderr=subprocess.STDOUT)
+ install_deps_command = ["conda", "install", "--name", "graphar",
"--yes"] + dependencies
+ subprocess.run(install_deps_command, check=True,
stderr=subprocess.STDOUT)
+ subprocess.run(["conda", "activate", "graphar"], check=True,
stderr=subprocess.STDOUT, shell=True)
+
+
+def build_and_test_cpp(dir):
+ print("Start building, install and test C++ library")
+
+ maybe_setup_conda(["--file", f"{dir}/dev/release/conda_env_cpp.txt"])
+
+ cmake_command = ["cmake", ".", "-DBUILD_TESTS=ON"]
+ subprocess.run(
+ cmake_command,
+ cwd=dir / "cpp",
+ check=True,
+ stderr=subprocess.STDOUT,
+ )
+ build_and_install_command = [
+ "cmake",
+ "--build",
+ ".",
+ "--target",
+ "install",
+ ]
+ subprocess.run(
+ build_and_install_command,
+ cwd=dir / "cpp",
+ check=True,
+ stderr=subprocess.STDOUT,
+ )
+ test_command = [
+ "ctest",
+ "--output-on-failure",
+ "--timeout",
+ "300",
+ "-VV"
+ ]
+ subprocess.run(
+ test_command,
+ cwd=dir / "cpp",
+ check=True,
+ stderr=subprocess.STDOUT,
+ )
+ print(f"{GREEN}Success to build graphar c++{ENDCOLOR}")
+
+
+def build_and_test_scala(dir):
+ print("Start building, install and test Scala with Spark library")
+
+ maybe_setup_conda(["--file", f"{dir}/dev/release/conda_env_scala.txt"])
+
+ build_command_32=["mvn", "clean", "package", "-P", "datasource32"]
+ subprocess.run(
+ build_command_32,
+ cwd=dir / "maven-projects/spark",
+ check=True,
+ stderr=subprocess.STDOUT,
+ )
+ build_command_33=["mvn", "clean", "package", "-P", "datasource33"]
+ subprocess.run(
+ build_command_33,
+ cwd=dir / "maven-projects/spark",
+ check=True,
+ stderr=subprocess.STDOUT,
+ )
+
+ print(f"{GREEN}Success to build graphar scala{ENDCOLOR}")
+
+if __name__ == "__main__":
+ # Get a list of all files in the current directory
+ files = [f for f in os.listdir(".") if os.path.isfile(f)]
+
+ for pkg in files:
+ # Skip files that don't have a corresponding .asc or .sha512 file
+ if not os.path.exists(f"{pkg}.asc") or not
os.path.exists(f"{pkg}.sha512"):
+ continue
+
+ print(f"> Checking {pkg}")
+
+ # Perform the checks
+ check_signature(pkg)
+ check_sha512sum(pkg)
+
+ extract_packages()
+
+ for dir in BASE_DIR.glob("apache-graphar-*-src/"):
+ check_license(dir)
+ check_notice(dir)
+ build_and_test_cpp(dir)
+ build_and_test_scala(dir)
diff --git a/format/adjacent_list.proto b/format/adjacent_list.proto
index 705de694..21312530 100644
--- a/format/adjacent_list.proto
+++ b/format/adjacent_list.proto
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
import "types.proto";
diff --git a/format/edge_info.proto b/format/edge_info.proto
index 8385f4ae..72f5757c 100644
--- a/format/edge_info.proto
+++ b/format/edge_info.proto
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
import "property_group.proto";
import "adjacent_list.proto";
diff --git a/format/graph_info.proto b/format/graph_info.proto
index 6490c570..e5c6e2ec 100644
--- a/format/graph_info.proto
+++ b/format/graph_info.proto
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
import "vertex_info.proto";
import "edge_info.proto";
diff --git a/format/property_group.proto b/format/property_group.proto
index 95b8c522..5cdbb42b 100644
--- a/format/property_group.proto
+++ b/format/property_group.proto
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
import "types.proto";
diff --git a/format/types.proto b/format/types.proto
index fc152b41..234b9e86 100644
--- a/format/types.proto
+++ b/format/types.proto
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
enum DataType {
BOOL = 0;
diff --git a/format/vertex_info.proto b/format/vertex_info.proto
index da158674..136b89c9 100644
--- a/format/vertex_info.proto
+++ b/format/vertex_info.proto
@@ -1,4 +1,4 @@
-/*
+ /*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
@@ -20,7 +20,8 @@
syntax = "proto3";
package graphar;
-option java_package = "org.apache.graphar.info";
+option java_multiple_files = true;
+option java_package = "org.apache.graphar.info.proto";
import "property_group.proto";
diff --git a/licenserc.toml b/licenserc.toml
index b6e0919a..ed4a4c14 100644
--- a/licenserc.toml
+++ b/licenserc.toml
@@ -45,7 +45,9 @@ excludes = [
"cpp/thirdparty",
"cpp/misc/cpplint.py",
"spark/datasources-32/src/main/scala/org/apache/graphar/datasources",
+ "spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar",
"spark/datasources-33/src/main/scala/org/apache/graphar/datasources",
+ "spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar",
"java/src/main/java/org/apache/graphar/stdcxx/StdString.java",
"java/src/main/java/org/apache/graphar/stdcxx/StdVector.java",
"java/src/main/java/org/apache/graphar/stdcxx/StdSharedPtr.java",
diff --git a/maven-projects/info/pom.xml b/maven-projects/info/pom.xml
index 79d4119e..ea59280d 100644
--- a/maven-projects/info/pom.xml
+++ b/maven-projects/info/pom.xml
@@ -34,6 +34,7 @@
<artifactId>info</artifactId>
<packaging>jar</packaging>
+ <version>0.13.0.dev-SNAPSHOT</version>
<name>info</name>
diff --git a/maven-projects/java/README.md b/maven-projects/java/README.md
index 12572e13..3a3f15d3 100644
--- a/maven-projects/java/README.md
+++ b/maven-projects/java/README.md
@@ -1,4 +1,4 @@
-# GraphAr Java
+# GraphAr Java (under development)
This directory contains the code and build system for the GraphAr Java library
which powered by [Alibaba-FastFFI](https://github.com/alibaba/fastFFI).
diff --git a/maven-projects/java/pom.xml b/maven-projects/java/pom.xml
index a5a1fdf4..e0c3b4d3 100644
--- a/maven-projects/java/pom.xml
+++ b/maven-projects/java/pom.xml
@@ -34,6 +34,7 @@
<artifactId>java</artifactId>
<packaging>jar</packaging>
+ <version>0.13.0.dev-SNAPSHOT</version>
<name>java</name>
diff --git a/maven-projects/pom.xml b/maven-projects/pom.xml
index beb592dc..79d4b661 100644
--- a/maven-projects/pom.xml
+++ b/maven-projects/pom.xml
@@ -69,7 +69,7 @@
<url>https://github.com/apache/graphar</url>
</scm>-->
<properties>
- <graphar.version>0.1.0-SNAPSHOT</graphar.version>
+ <graphar.version>0.12.0-SNAPSHOT</graphar.version>
</properties>
<modules>
<module>java</module>
diff --git a/maven-projects/spark/README.md b/maven-projects/spark/README.md
index a0967ca0..cb7921bf 100644
--- a/maven-projects/spark/README.md
+++ b/maven-projects/spark/README.md
@@ -21,7 +21,6 @@ repository and navigated to the ``spark`` subdirectory:
```bash
$ git clone https://github.com/apache/incubator-graphar.git
$ cd incubator-graphar
- $ git submodule update --init
$ cd mavens-projects/spark
```
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
index 38a3c183..7424ad68 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -16,24 +16,24 @@
package org.apache.graphar.datasources
-import scala.collection.JavaConverters._
-import scala.util.matching.Regex
-import java.util
-
import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
-
+import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.catalog.{Table, TableProvider}
+import org.apache.spark.sql.connector.expressions.Transform
import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
+import org.apache.spark.sql.graphar.GarTable
+import org.apache.spark.sql.sources.DataSourceRegister
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
-import org.apache.spark.sql.sources.DataSourceRegister
-import org.apache.spark.sql.connector.expressions.Transform
+
+import java.util
+import scala.collection.JavaConverters._
+import scala.util.matching.Regex
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
similarity index 98%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
index 07cff02e..ef64ec39 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
@@ -17,16 +17,14 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
import org.apache.graphar.GeneralParams
-
-import org.json4s._
-import org.json4s.jackson.JsonMethods._
-
-import
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
import org.apache.hadoop.mapreduce._
import org.apache.spark.internal.Logging
+import
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
+import org.json4s._
+import org.json4s.jackson.JsonMethods._
object GarCommitProtocol {
private def binarySearchPair(aggNums: Array[Int], key: Int): (Int, Int) = {
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
similarity index 98%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
index 4b063db7..b6027f8a 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScan.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
@@ -17,40 +17,39 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+package org.apache.spark.sql.graphar
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.parquet.hadoop.ParquetInputFormat
-
import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.expressions.{Expression, ExprUtils}
import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.catalyst.expressions.{ExprUtils, Expression}
import org.apache.spark.sql.connector.read.PartitionReaderFactory
import org.apache.spark.sql.execution.PartitionedFileUtil
-import org.apache.spark.sql.execution.datasources.{
- FilePartition,
- PartitioningAwareFileIndex,
- PartitionedFile
-}
import org.apache.spark.sql.execution.datasources.parquet.{
ParquetOptions,
ParquetReadSupport,
ParquetWriteSupport
}
import org.apache.spark.sql.execution.datasources.v2.FileScan
-import
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
-import
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
import
org.apache.spark.sql.execution.datasources.v2.csv.CSVPartitionReaderFactory
+import
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
+import
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
+import org.apache.spark.sql.execution.datasources.{
+ FilePartition,
+ PartitionedFile,
+ PartitioningAwareFileIndex
+}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.spark.util.SerializableConfiguration
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
/** GarScan is a class to implement the file scan for GarDataSource. */
case class GarScan(
sparkSession: SparkSession,
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
similarity index 99%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
index 1e83c773..0ae95894 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
@@ -17,20 +17,19 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.read.{Scan, SupportsPushDownFilters}
import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
-
import org.apache.spark.sql.execution.datasources.v2.FileScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import scala.collection.JavaConverters._
-import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
-import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
/** GarScanBuilder is a class to build the file scan for GarDataSource. */
case class GarScanBuilder(
@@ -49,6 +48,7 @@ case class GarScanBuilder(
}
private var filters: Array[Filter] = Array.empty
+
override def pushFilters(filters: Array[Filter]): Array[Filter] = {
this.filters = filters
filters
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
similarity index 95%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
index 8aa23179..acf4943c 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarTable.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
@@ -17,26 +17,24 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
import org.apache.hadoop.fs.FileStatus
-
import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
import org.apache.spark.sql.execution.datasources.FileFormat
import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
import org.apache.spark.sql.execution.datasources.orc.OrcUtils
import org.apache.spark.sql.execution.datasources.parquet.ParquetUtils
import org.apache.spark.sql.execution.datasources.v2.FileTable
+import org.apache.spark.sql.graphar.csv.CSVWriteBuilder
+import org.apache.spark.sql.graphar.orc.OrcWriteBuilder
+import org.apache.spark.sql.graphar.parquet.ParquetWriteBuilder
import org.apache.spark.sql.types._
import org.apache.spark.sql.util.CaseInsensitiveStringMap
-import org.apache.graphar.datasources.csv.CSVWriteBuilder
-import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
-import org.apache.graphar.datasources.orc.OrcWriteBuilder
+import scala.collection.JavaConverters._
/** GarTable is a class to represent the graph data in GraphAr as a table. */
case class GarTable(
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
similarity index 97%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
index 3acd9247..f6caa75d 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
@@ -17,27 +17,22 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriteBuilder.scala
-package org.apache.graphar.datasources
-
-import java.util.UUID
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
-import org.apache.hadoop.mapreduce.Job
-
-import org.apache.spark.sql.execution.datasources.OutputWriterFactory
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils}
import org.apache.spark.sql.connector.write.{
BatchWrite,
LogicalWriteInfo,
WriteBuilder
}
+import org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
import org.apache.spark.sql.execution.datasources.{
BasicWriteJobStatsTracker,
DataSource,
@@ -48,8 +43,9 @@ import org.apache.spark.sql.execution.metric.SQLMetric
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}
import org.apache.spark.util.SerializableConfiguration
-import org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
+
+import java.util.UUID
+import scala.collection.JavaConverters._
abstract class GarWriteBuilder(
paths: Seq[String],
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
similarity index 96%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
index c0a38d52..7dd4dda8 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
@@ -17,23 +17,22 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVWriteBuilder.scala
-package org.apache.graphar.datasources.csv
+package org.apache.spark.sql.graphar.csv
import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
import org.apache.spark.sql.catalyst.csv.CSVOptions
import org.apache.spark.sql.catalyst.util.CompressionCodecs
import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
import org.apache.spark.sql.execution.datasources.{
CodecStreams,
OutputWriter,
OutputWriterFactory
}
-import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
+import org.apache.spark.sql.graphar.GarWriteBuilder
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}
-import org.apache.graphar.datasources.GarWriteBuilder
-
class CSVWriteBuilder(
paths: Seq[String],
formatName: String,
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
similarity index 96%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
index c1d2ff82..e86e6629 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
@@ -18,18 +18,17 @@
// we have to reimplement it here.
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.mapreduce.TaskAttemptContext
import org.apache.orc.OrcFile
import org.apache.orc.mapred.{
- OrcOutputFormat => OrcMapRedOutputFormat,
- OrcStruct
+ OrcStruct,
+ OrcOutputFormat => OrcMapRedOutputFormat
}
import org.apache.orc.mapreduce.{OrcMapreduceRecordWriter, OrcOutputFormat}
-
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.datasources.OutputWriter
import org.apache.spark.sql.execution.datasources.orc.{OrcSerializer, OrcUtils}
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
similarity index 97%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
index 9bdf796b..05147c14 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
@@ -17,24 +17,22 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/ORCWriteBuilder.scala
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
import org.apache.orc.OrcConf.{COMPRESS, MAPRED_OUTPUT_SCHEMA}
import org.apache.orc.mapred.OrcStruct
-
import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
import org.apache.spark.sql.execution.datasources.{
OutputWriter,
OutputWriterFactory
}
-import org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
+import org.apache.spark.sql.graphar.GarWriteBuilder
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
-import org.apache.graphar.datasources.GarWriteBuilder
-
object OrcWriteBuilder {
// the getQuotedSchemaString method of spark OrcFileFormat
private def getQuotedSchemaString(dataType: DataType): String =
diff --git
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
similarity index 96%
rename from
maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
rename to
maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
index 8d7feceb..d75f725e 100644
---
a/maven-projects/spark/datasources-32/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
+++
b/maven-projects/spark/datasources-32/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
@@ -17,28 +17,25 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
-package org.apache.graphar.datasources.parquet
+package org.apache.spark.sql.graphar.parquet
import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext}
-import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
import org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel
import org.apache.parquet.hadoop.codec.CodecConfig
import org.apache.parquet.hadoop.util.ContextUtil
-
-import org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter
+import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
import org.apache.spark.internal.Logging
import org.apache.spark.sql.Row
import org.apache.spark.sql.connector.write.LogicalWriteInfo
+import org.apache.spark.sql.execution.datasources.parquet._
import org.apache.spark.sql.execution.datasources.{
OutputWriter,
OutputWriterFactory
}
-import org.apache.spark.sql.execution.datasources.parquet._
+import org.apache.spark.sql.graphar.GarWriteBuilder
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
-import org.apache.graphar.datasources.GarWriteBuilder
-
class ParquetWriteBuilder(
paths: Seq[String],
formatName: String,
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
index 38a3c183..b6094914 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
@@ -19,11 +19,9 @@ package org.apache.graphar.datasources
import scala.collection.JavaConverters._
import scala.util.matching.Regex
import java.util
-
import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
-
import org.apache.spark.sql.connector.catalog.{Table, TableProvider}
import org.apache.spark.sql.execution.datasources._
import org.apache.spark.sql.SparkSession
@@ -34,6 +32,7 @@ import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.spark.sql.sources.DataSourceRegister
import org.apache.spark.sql.connector.expressions.Transform
+import org.apache.spark.sql.graphar.GarTable
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
similarity index 93%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
index 8be2e237..c6ca79c2 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarCommitProtocol.scala
@@ -17,7 +17,7 @@
// Derived from Apache Spark 3.3.4
//
https://github.com/apache/spark/blob/18db204/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
import org.apache.graphar.GeneralParams
@@ -73,16 +73,14 @@ class GarCommitProtocol(
val partitionId = taskContext.getTaskAttemptID.getTaskID.getId
if (options.contains(GeneralParams.offsetStartChunkIndexKey)) {
// offset chunk file name, looks like chunk0
- val chunk_index = options
- .get(GeneralParams.offsetStartChunkIndexKey)
- .get
- .toInt + partitionId
+ val chunk_index =
+ options(GeneralParams.offsetStartChunkIndexKey).toInt + partitionId
return f"chunk$chunk_index"
}
if (options.contains(GeneralParams.aggNumListOfEdgeChunkKey)) {
// edge chunk file name, looks like part0/chunk0
val jValue = parse(
- options.get(GeneralParams.aggNumListOfEdgeChunkKey).get
+ options(GeneralParams.aggNumListOfEdgeChunkKey)
)
implicit val formats =
DefaultFormats // initialize a default formats for json4s
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
similarity index 98%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
index bf4995b0..feaa7e56 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScan.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScan.scala
@@ -17,24 +17,20 @@
// Derived from Apache Spark 3.3.4
//
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+package org.apache.spark.sql.graphar
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.parquet.hadoop.ParquetInputFormat
-
import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.expressions.{Expression, ExprUtils}
import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.catalyst.expressions.{ExprUtils, Expression}
import org.apache.spark.sql.connector.read.PartitionReaderFactory
import org.apache.spark.sql.execution.PartitionedFileUtil
import org.apache.spark.sql.execution.datasources.{
FilePartition,
- PartitioningAwareFileIndex,
- PartitionedFile
+ PartitionedFile,
+ PartitioningAwareFileIndex
}
import org.apache.spark.sql.execution.datasources.parquet.{
ParquetOptions,
@@ -42,15 +38,18 @@ import org.apache.spark.sql.execution.datasources.parquet.{
ParquetWriteSupport
}
import org.apache.spark.sql.execution.datasources.v2.FileScan
-import
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
-import
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
import
org.apache.spark.sql.execution.datasources.v2.csv.CSVPartitionReaderFactory
+import
org.apache.spark.sql.execution.datasources.v2.orc.OrcPartitionReaderFactory
+import
org.apache.spark.sql.execution.datasources.v2.parquet.ParquetPartitionReaderFactory
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.spark.util.SerializableConfiguration
+import scala.collection.mutable.ArrayBuffer
+import scala.jdk.CollectionConverters._
+
/** GarScan is a class to implement the file scan for GarDataSource. */
case class GarScan(
sparkSession: SparkSession,
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
similarity index 98%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
index 85f43e59..94fe5752 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarScanBuilder.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarScanBuilder.scala
@@ -17,20 +17,19 @@
// Derived from Apache Spark 3.3.4
//
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.read.Scan
import org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex
-
import org.apache.spark.sql.execution.datasources.v2.FileScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
+import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import scala.collection.JavaConverters._
-import org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder
-import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScanBuilder
/** GarScanBuilder is a class to build the file scan for GarDataSource. */
case class GarScanBuilder(
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
similarity index 95%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
index 8aa23179..acf4943c 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarTable.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarTable.scala
@@ -17,26 +17,24 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
-package org.apache.graphar.datasources
-
-import scala.collection.JavaConverters._
+package org.apache.spark.sql.graphar
import org.apache.hadoop.fs.FileStatus
-
import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
import org.apache.spark.sql.catalyst.csv.CSVOptions
+import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
import org.apache.spark.sql.execution.datasources.FileFormat
import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
import org.apache.spark.sql.execution.datasources.orc.OrcUtils
import org.apache.spark.sql.execution.datasources.parquet.ParquetUtils
import org.apache.spark.sql.execution.datasources.v2.FileTable
+import org.apache.spark.sql.graphar.csv.CSVWriteBuilder
+import org.apache.spark.sql.graphar.orc.OrcWriteBuilder
+import org.apache.spark.sql.graphar.parquet.ParquetWriteBuilder
import org.apache.spark.sql.types._
import org.apache.spark.sql.util.CaseInsensitiveStringMap
-import org.apache.graphar.datasources.csv.CSVWriteBuilder
-import org.apache.graphar.datasources.parquet.ParquetWriteBuilder
-import org.apache.graphar.datasources.orc.OrcWriteBuilder
+import scala.collection.JavaConverters._
/** GarTable is a class to represent the graph data in GraphAr as a table. */
case class GarTable(
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
similarity index 98%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
index 8363ae26..009d5da7 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/GarWriterBuilder.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/GarWriteBuilder.scala
@@ -17,7 +17,7 @@
// Derived from Apache Spark 3.3.4
//
https://github.com/apache/spark/blob/18db204/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriteBuilder.scala
-package org.apache.graphar.datasources
+package org.apache.spark.sql.graphar
import java.util.UUID
@@ -27,7 +27,6 @@ import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
-import org.apache.hadoop.mapreduce.Job
import org.apache.spark.sql.execution.datasources.OutputWriterFactory
import org.apache.spark.sql.SparkSession
@@ -41,7 +40,6 @@ import org.apache.spark.sql.connector.write.{
import org.apache.spark.sql.execution.datasources.{
BasicWriteJobStatsTracker,
DataSource,
- OutputWriterFactory,
WriteJobDescription
}
import org.apache.spark.sql.execution.metric.SQLMetric
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
similarity index 96%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
index c0a38d52..68e156e0 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/csv/CSVWriterBuilder.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/csv/CSVWriteBuilder.scala
@@ -17,7 +17,7 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVWriteBuilder.scala
-package org.apache.graphar.datasources.csv
+package org.apache.spark.sql.graphar.csv
import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
import org.apache.spark.sql.catalyst.csv.CSVOptions
@@ -31,8 +31,7 @@ import org.apache.spark.sql.execution.datasources.{
import org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}
-
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
class CSVWriteBuilder(
paths: Seq[String],
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
similarity index 98%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
index c1d2ff82..ccc7a48e 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcOutputWriter.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcOutputWriter.scala
@@ -18,7 +18,7 @@
// we have to reimplement it here.
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.NullWritable
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
similarity index 97%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
index 9bdf796b..287162f8 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/orc/OrcWriteBuilder.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/orc/OrcWriteBuilder.scala
@@ -17,7 +17,7 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/ORCWriteBuilder.scala
-package org.apache.graphar.datasources.orc
+package org.apache.spark.sql.graphar.orc
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
@@ -33,7 +33,7 @@ import
org.apache.spark.sql.execution.datasources.orc.{OrcOptions, OrcUtils}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
object OrcWriteBuilder {
// the getQuotedSchemaString method of spark OrcFileFormat
diff --git
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
similarity index 98%
rename from
maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
rename to
maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
index 5c92204b..8e53dc5f 100644
---
a/maven-projects/spark/datasources-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriterBuilder.scala
+++
b/maven-projects/spark/datasources-33/src/main/scala/org/apache/spark/sql/graphar/parquet/ParquetWriteBuilder.scala
@@ -17,7 +17,7 @@
// Derived from Apache Spark 3.1.1
//
https://github.com/apache/spark/blob/1d550c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
-package org.apache.graphar.datasources.parquet
+package org.apache.spark.sql.graphar.parquet
import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext}
import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat}
@@ -36,7 +36,7 @@ import org.apache.spark.sql.execution.datasources.parquet._
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
-import org.apache.graphar.datasources.GarWriteBuilder
+import org.apache.spark.sql.graphar.GarWriteBuilder
class ParquetWriteBuilder(
paths: Seq[String],
diff --git a/maven-projects/spark/graphar/pom.xml
b/maven-projects/spark/graphar/pom.xml
index 45b99fbf..74626a62 100644
--- a/maven-projects/spark/graphar/pom.xml
+++ b/maven-projects/spark/graphar/pom.xml
@@ -32,6 +32,7 @@
</parent>
<artifactId>graphar-commons</artifactId>
+ <version>${graphar.version}</version>
<packaging>jar</packaging>
<dependencies>
diff --git a/maven-projects/spark/import/neo4j.sh
b/maven-projects/spark/import/neo4j.sh
index dbae0273..6a3fa09d 100755
--- a/maven-projects/spark/import/neo4j.sh
+++ b/maven-projects/spark/import/neo4j.sh
@@ -20,7 +20,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
conf_path="$(readlink -f $1)"
spark-submit --class org.apache.graphar.importer.Neo4j ${jar_file} \
diff --git a/maven-projects/spark/pom.xml b/maven-projects/spark/pom.xml
index caab96d5..e04ed4ae 100644
--- a/maven-projects/spark/pom.xml
+++ b/maven-projects/spark/pom.xml
@@ -33,6 +33,7 @@
<artifactId>spark</artifactId>
<packaging>pom</packaging>
+ <version>${graphar.version}</version>
<profiles>
<profile>
diff --git a/maven-projects/spark/scripts/run-graphar2nebula.sh
b/maven-projects/spark/scripts/run-graphar2nebula.sh
index 6a3b1ff1..8f772159 100755
--- a/maven-projects/spark/scripts/run-graphar2nebula.sh
+++ b/maven-projects/spark/scripts/run-graphar2nebula.sh
@@ -20,7 +20,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/nebula2graphar/basketballplayergraph.graph.yml}"
spark-submit --class org.apache.graphar.example.GraphAr2Nebula ${jar_file} \
diff --git a/maven-projects/spark/scripts/run-graphar2neo4j.sh
b/maven-projects/spark/scripts/run-graphar2neo4j.sh
index d1111aca..11f9caf8 100755
--- a/maven-projects/spark/scripts/run-graphar2neo4j.sh
+++ b/maven-projects/spark/scripts/run-graphar2neo4j.sh
@@ -21,7 +21,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
graph_info_path="${GRAPH_INFO_PATH:-/tmp/graphar/neo4j2graphar/MovieGraph.graph.yml}"
spark-submit --class org.apache.graphar.example.GraphAr2Neo4j ${jar_file} \
diff --git a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
index 40c07db3..42f55552 100755
--- a/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
+++ b/maven-projects/spark/scripts/run-ldbc-sample2graphar.sh
@@ -21,7 +21,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_0_0.csv"
person_knows_person_input_file="${GAR_TEST_DATA}/ldbc_sample/person_knows_person_0_0.csv"
output_dir="/tmp/graphar/ldbc_sample"
diff --git a/maven-projects/spark/scripts/run-nebula2graphar.sh
b/maven-projects/spark/scripts/run-nebula2graphar.sh
index cd94381e..f8eb8b7d 100755
--- a/maven-projects/spark/scripts/run-nebula2graphar.sh
+++ b/maven-projects/spark/scripts/run-nebula2graphar.sh
@@ -20,7 +20,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
vertex_chunk_size=100
edge_chunk_size=1024
diff --git a/maven-projects/spark/scripts/run-neo4j2graphar.sh
b/maven-projects/spark/scripts/run-neo4j2graphar.sh
index 158913ee..90711894 100755
--- a/maven-projects/spark/scripts/run-neo4j2graphar.sh
+++ b/maven-projects/spark/scripts/run-neo4j2graphar.sh
@@ -21,7 +21,7 @@
set -eu
cur_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-jar_file="${cur_dir}/../graphar/target/graphar-commons-0.1.0-SNAPSHOT-shaded.jar"
+jar_file="${cur_dir}/../graphar/target/graphar-commons-0.12.0-SNAPSHOT-shaded.jar"
vertex_chunk_size=100
edge_chunk_size=1024
diff --git a/pyspark/README.md b/pyspark/README.md
index 1aea4310..8816255c 100644
--- a/pyspark/README.md
+++ b/pyspark/README.md
@@ -1,4 +1,4 @@
-# GraphAr PySpark
+# GraphAr PySpark (under development)
This directory contains the code and build system for the GraphAr PySpark
library. Library is implemented as bindings to GraphAr Scala Spark library and
does not contain any real logic.
diff --git a/pyspark/graphar_pyspark/__init__.py
b/pyspark/graphar_pyspark/__init__.py
index c276aeb0..bdca0fcf 100644
--- a/pyspark/graphar_pyspark/__init__.py
+++ b/pyspark/graphar_pyspark/__init__.py
@@ -21,6 +21,7 @@ from pyspark.sql import SparkSession
from graphar_pyspark.errors import GraphArIsNotInitializedError
+__version__ = "0.13.0.dev"
class _GraphArSession:
"""Singleton GraphAr helper object, that contains SparkSession and JVM.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]