(incubator-graphar) branch main updated: feat(CI):enable markdownlint and typos in docs.yml (#508)

weibin Mon, 12 Aug 2024 01:40:56 -0700

This is an automated email from the ASF dual-hosted git repository.

weibin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git



The following commit(s) were added to refs/heads/main by this push:
     new 8bb741e3 feat(CI):enable markdownlint and typos in docs.yml (#508)
8bb741e3 is described below

commit 8bb741e3897019b5ed654923dc066053bbc93a5c
Author: teapot1de <[email protected]>
AuthorDate: Mon Aug 12 16:40:43 2024 +0800

    feat(CI):enable markdownlint and typos in docs.yml (#508)
---
 .github/workflows/docs.yml                  | 14 +++++++++--
 docs/.markdownlint.yaml                     |  8 ++++++
 docs/index.md                               |  5 +++-
 docs/libraries/cpp/examples/graphscope.md   |  2 +-
 docs/libraries/cpp/examples/out-of-core.md  |  1 -
 docs/libraries/cpp/getting-started.md       |  4 +--
 docs/libraries/java/how_to_develop_java.md  | 10 ++++----
 docs/libraries/java/java.md                 | 39 ++++++++++++++---------------
 docs/libraries/pyspark/how-to.md            | 27 +++++++++-----------
 docs/libraries/pyspark/pyspark.md           |  4 ---
 docs/libraries/spark/examples.md            |  5 +---
 docs/libraries/spark/spark.md               | 13 +++-------
 docs/overview/concepts.md                   | 12 ++++-----
 docs/overview/motivation.md                 |  8 +++---
 docs/overview/overview.md                   |  1 +
 docs/specification/format.md                | 21 ++++++++--------
 docs/specification/implementation-status.md | 16 +++---------
 licenserc.toml                              |  1 +
 18 files changed, 94 insertions(+), 97 deletions(-)

diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index 60b7b97f..44639d93 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -54,6 +54,18 @@ jobs:
         with:
           node-version: '18'
 
+      - name: Run markdownlint
+        run: |
+          npm install -g markdownlint-cli
+          markdownlint 'docs/**/*.md' --fix --config 'docs/.markdownlint.yaml'
+
+      - name: Run typos
+        run: |
+          curl -sSL 
https://github.com/crate-ci/typos/releases/download/v1.23.6/typos-v1.23.6-x86_64-unknown-linux-musl.tar.gz
 -o typos.tar.gz
+          tar -xzf typos.tar.gz
+          chmod +x typos
+          ./typos docs
+
       - name: Checkout Website
         uses: actions/checkout@v4
         with:
@@ -74,5 +86,3 @@ jobs:
       - name: Build
         working-directory: website
         run: pnpm build
-
-# TODO: enable markdownlint & typos
diff --git a/docs/.markdownlint.yaml b/docs/.markdownlint.yaml
new file mode 100644
index 00000000..c8432bf9
--- /dev/null
+++ b/docs/.markdownlint.yaml
@@ -0,0 +1,8 @@
+# Ignore MD013 because the document requires long lines to keep code examples 
intact
+MD013: false
+
+# Ignore MD033 because inline HTML is necessary in some cases, such as 
specific formatting needs
+MD033: false
+
+# Ignore MD025 because the document structure requires multiple top-level 
headings to reflect different chapters or sections
+MD025: false
diff --git a/docs/index.md b/docs/index.md
index e033a6f4..37150953 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -8,10 +8,13 @@ sidebar_position: 0
 Welcome to the documentation for Apache GraphAr. Here, you can find 
information about the GraphAr File Format, including specification and 
libraries.
 
 ### [Overview](/docs/overview)
+
 Overview of the Apache GraphAr project.
 
 ### [Specification](/docs/category/specification)
+
 Documentation about the Apache GraphAr file format.
 
 ### [Libraries](/docs/category/libraries)
-Documentation about the libraries of Apache GraphAr. 
+
+Documentation about the libraries of Apache GraphAr.
diff --git a/docs/libraries/cpp/examples/graphscope.md 
b/docs/libraries/cpp/examples/graphscope.md
index 8afcc96d..784f628e 100644
--- a/docs/libraries/cpp/examples/graphscope.md
+++ b/docs/libraries/cpp/examples/graphscope.md
@@ -30,7 +30,7 @@ The time performance of *ArrowFragmentBuilder* and 
*ArrowFragmentWriter*
 in GraphScope is heavily dependent on the partitioning of the graph into
 GraphAr format files, that is, the *vertex chunk size* and *edge chunk size*, 
which
 are specified in the vertex information file and in the edge information
-file, respectively. 
+file, respectively.
 
 Generally speaking, fewer chunks are created if the file size is large.
 On small graphs, this can be disadvantageous as it reduces the degree of
diff --git a/docs/libraries/cpp/examples/out-of-core.md 
b/docs/libraries/cpp/examples/out-of-core.md
index 08cf509d..7b97426b 100644
--- a/docs/libraries/cpp/examples/out-of-core.md
+++ b/docs/libraries/cpp/examples/out-of-core.md
@@ -89,7 +89,6 @@ neighbors. Please refer to
 
[cc_push_example.cc](https://github.com/apache/incubator-graphar/blob/main/cpp/examples/cc_push_example.cc)
 for the complete code.
 
-
 :::tip
 
 In this example, two kinds of edges are used. The
diff --git a/docs/libraries/cpp/getting-started.md 
b/docs/libraries/cpp/getting-started.md
index 7265026d..cf93f75b 100644
--- a/docs/libraries/cpp/getting-started.md
+++ b/docs/libraries/cpp/getting-started.md
@@ -202,7 +202,7 @@ the above graph and outputs the end vertices for each edge.
 
 ```cpp
 graph_info = ...
-auto expect = graphar::EdgesCollection::Make(graph_info, "person", "konws", 
"person", graphar::AdjListType::ordered_by_source);
+auto expect = graphar::EdgesCollection::Make(graph_info, "person", "knows", 
"person", graphar::AdjListType::ordered_by_source);
 auto edges = expect.value();
 
 for (auto it = edges->begin(); it != edges->end(); ++it) {
@@ -287,4 +287,4 @@ with URI schema, e.g., "s3://bucket-name/path/to/data" or 
"s3://\[access-key:sec
 
 [Code 
example](https://github.com/apache/incubator-graphar/blob/main/cpp/test/test_info.cc#L777-L792)
 demonstrates how to read data from S3.
 
-Note that once you use cloud storage, you need to call `graphar::InitalizeS3` 
to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` 
to shut down the APIs after the work finish.
+Note that once you use cloud storage, you need to call `graphar::InitializeS3` 
to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` 
to shut down the APIs after the work finish.
diff --git a/docs/libraries/java/how_to_develop_java.md 
b/docs/libraries/java/how_to_develop_java.md
index 4a4aedcd..e279c91e 100644
--- a/docs/libraries/java/how_to_develop_java.md
+++ b/docs/libraries/java/how_to_develop_java.md
@@ -10,7 +10,7 @@ GraphAr Java library based on GraphAr C++ library and an 
efficient FFI
 for Java and C++ called
 [FastFFI](https://github.com/alibaba/fastFFI).
 
-### Source Code Level 
+### Source Code Level
 
 - Interface
 - Class
@@ -80,8 +80,8 @@ Please refer to
 ## How To Test
 
 ```bash
-$ export GAR_TEST_DATA=$PWD/../../testing/
-$ mvn clean test
+export GAR_TEST_DATA=$PWD/../../testing/
+mvn clean test
 ```
 
 This will build GraphAr C++ library internally for Java. If you already
@@ -96,11 +96,11 @@ To ensure CI for checking code style will pass, please 
ensure check
 below is success:
 
 ```bash
-$ mvn spotless:check
+mvn spotless:check
 ```
 
 If there are violations, running command below to automatically format:
 
 ```bash
-$ mvn spotless:apply
+mvn spotless:apply
 ```
diff --git a/docs/libraries/java/java.md b/docs/libraries/java/java.md
index 352a36a6..fdc90718 100644
--- a/docs/libraries/java/java.md
+++ b/docs/libraries/java/java.md
@@ -11,19 +11,19 @@ Based on an efficient FFI for Java and C++ called
 library allows users to write Java for generating, loading and
 transforming GraphAr format files. It consists of several components:
 
--  **Information Classes**: As same with in the C++ library, the
+- **Information Classes**: As same with in the C++ library, the
    information classes are implemented to construct and access the meta
    information about the **graphs**, **vertices** and **edges** in
    GraphAr.
 
--  **Writers**: The GraphAr Java writer provides a set of interfaces
+- **Writers**: The GraphAr Java writer provides a set of interfaces
    that can be used to write Apache Arrow VectorSchemaRoot into GraphAr format
    files. Every time it takes a VectorSchemaRoot as the logical table
    for a type of vertices or edges, then convert it to ArrowTable, and
    then dumps it to standard GraphAr format files (CSV, ORC or Parquet files) 
under
    the specific directory path.
 
--  **Readers**: The GraphAr Java reader provides a set of interfaces
+- **Readers**: The GraphAr Java reader provides a set of interfaces
    that can be used to read GraphAr format files. It reads a collection of 
vertices
    or edges at a time and assembles the result into the ArrowTable.
    Similar with the reader in the C++ library, it supports the users to
@@ -41,41 +41,41 @@ Firstly, install llvm-11. `LLVM11_HOME` should point to the 
home of
 LLVM 11. In Ubuntu, it is at `/usr/lib/llvm-11`. Basically, the build
 procedure the following binary:
 
--  `$LLVM11_HOME/bin/clang++`
--  `$LLVM11_HOME/bin/ld.lld`
--  `$LLVM11_HOME/lib/cmake/llvm`
+- `$LLVM11_HOME/bin/clang++`
+- `$LLVM11_HOME/bin/ld.lld`
+- `$LLVM11_HOME/lib/cmake/llvm`
 
 Tips:
 
--  Use Ubuntu as example:
+- Use Ubuntu as example:
 
 ```bash
-$ sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
-$ export LLVM11_HOME=/usr/lib/llvm-11
+sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
+export LLVM11_HOME=/usr/lib/llvm-11
 ```
 
--  Or compile from source with this 
[script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):
+- Or compile from source with this 
[script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):
 
 ```bash
-$ export LLVM11_HOME=/usr/lib/llvm-11
-$ export LLVM_VAR=11.0.0
-$ sudo ./install-llvm11.sh
+export LLVM11_HOME=/usr/lib/llvm-11
+export LLVM_VAR=11.0.0
+sudo ./install-llvm11.sh
 ```
 
 Make the graphar-java-library directory as the current working
 directory:
 
 ```bash
-$ git clone https://github.com/apache/incubator-graphar.git
-$ cd incubator-graphar
-$ git submodule update --init
-$ cd maven-projects/java
+git clone https://github.com/apache/incubator-graphar.git
+cd incubator-graphar
+git submodule update --init
+cd maven-projects/java
 ```
 
 Compile package:
 
 ```bash
-$ mvn clean install -DskipTests
+mvn clean install -DskipTests
 ```
 
 This will build GraphAr C++ library internally for Java. If you already 
installed GraphAr C++ library in your system,
@@ -83,7 +83,6 @@ you can append this option to skip: `-DbuildGarCPP=OFF`.
 
 Then set GraphAr as a dependency in maven project:
 
-
 ```xml
 <dependencies>
     <dependency>
@@ -212,4 +211,4 @@ StdPair<Long, Long> range = reader.getRange().value();
 
 See [test for
 
readers](https://github.com/apache/incubator-graphar/blob/main/maven-projects/java/src/test/java/org/apache/graphar/readers)
-for the complete example.
\ No newline at end of file
+for the complete example.
diff --git a/docs/libraries/pyspark/how-to.md b/docs/libraries/pyspark/how-to.md
index aebeb325..4e7e3a64 100644
--- a/docs/libraries/pyspark/how-to.md
+++ b/docs/libraries/pyspark/how-to.md
@@ -30,7 +30,7 @@ spark = (
 ## GraphAr PySpark initialize
 
 PySpark bindings are heavily relying on JVM-calls via ``py4j``. To
-initiate all the neccessary things for it just call
+initiate all the necessary things for it just call
 ``graphar_pyspark.initialize()``:
 
 ```python
@@ -53,15 +53,14 @@ from graphar_pyspark.enums import GarType, FileType
 
 Main objects of GraphAr are the following:
 
--  GraphInfo
--  VertexInfo
--  EdgeInfo
+- GraphInfo
+- VertexInfo
+- EdgeInfo
 
 You can check [Scala library documentation](../spark/spark.md)
 for the more detailed information.
 
-
-##  Creating objects in graphar_pyspark
+## Creating objects in graphar_pyspark
 
 GraphAr PySpark package provide two main ways how to initiate
 objects, like ``GraphInfo``:
@@ -71,7 +70,6 @@ objects, like ``GraphInfo``:
 - ``from_scala(jvm_ref)`` when you create an object from the
    corresponded JVM-object (``py4j.java_gateway.JavaObject``)
 
-
 ```python
 help(Property.from_python)
 
@@ -95,7 +93,7 @@ print(type(python_property))
 
 You can always get a reference to the corresponding JVM object. For
 example, if you want to use it in your own code and need a direct link
-to the underlaying instance of Scala Class, you can just call
+to the underlying instance of Scala Class, you can just call
 ``to_scala()`` method:
 
 ```python
@@ -128,9 +126,9 @@ Each public property and method of the Scala API is 
provided in
 python, but in a pythonic-naming convention. For example, in Scala,
 ``Property`` has the following fields:
 
--  name
--  data_type
--  is_primary
+- name
+- data_type
+- is_primary
 
 For each of such a field in Scala API there is a getter and setter
 methods. You can call them from the Python too:
@@ -142,7 +140,7 @@ python_property.get_name()
 ```
 
 You can also modify fields, but be careful: when you modify field of
-instance of the Python class, you modify the underlaying Scala Object
+instance of the Python class, you modify the underlying Scala Object
 at the same moment!
 
 ```python
@@ -168,7 +166,6 @@ modern_graph = 
GraphInfo.load_graph_info("../../testing/modern_graph/modern_grap
 After that you can work with such an objects like regular python
 objects:
 
-
 ```python
 print(modern_graph_v_person.dump())
 
@@ -195,7 +192,7 @@ label: person
 version: gar/v1
 "      
 ```
-            
+
 ```python
 print(modern_graph_v_person.contain_property("id") is True)
 print(modern_graph_v_person.contain_property("bad_id?") is False)
@@ -203,6 +200,6 @@ print(modern_graph_v_person.contain_property("bad_id?") is 
False)
 True
 True
 ```
-            
+
 Please, refer to Scala API and examples of GraphAr Spark Scala
 library to see detailed and business-case oriented examples!
diff --git a/docs/libraries/pyspark/pyspark.md 
b/docs/libraries/pyspark/pyspark.md
index 502eac39..cf0f8ddc 100644
--- a/docs/libraries/pyspark/pyspark.md
+++ b/docs/libraries/pyspark/pyspark.md
@@ -65,7 +65,6 @@ GraphAr PySpark uses poetry as a build system. Please refer to
 to find the manual how to install this tool. Currently GraphAr PySpark
 is build with Python 3.9 and PySpark 3.2
 
-
 Make the graphar-pyspark-library directory as the current working
 directory:
 
@@ -75,7 +74,6 @@ cd incubator-graphar/pyspark
 
 Build package:
 
-
 ```bash
 poetry build
 ```
@@ -87,7 +85,6 @@ generated in the directory *pyspark/dist/*.
 
 You cannot install graphar-pyspark from PyPi for now.
 
-
 ## How to Use
 
 ### Initialization
@@ -97,7 +94,6 @@ Scala. You need to have *spark-x.x.x.jar* in your 
*spark-jars*.
 Please refer to [GraphAr scala documentation](../spark/spark.md) to get
 this JAR.
 
-
 ```python
 // create a SparkSession from pyspark.sql import SparkSession
 
diff --git a/docs/libraries/spark/examples.md b/docs/libraries/spark/examples.md
index 144b5ca7..bd0d5d06 100644
--- a/docs/libraries/spark/examples.md
+++ b/docs/libraries/spark/examples.md
@@ -11,7 +11,6 @@ sidebar_position: 1
 
 Examples of this co-working integration have been provided as showcases.
 
-
 ### Examples
 
 ### Transform GraphAr format files
@@ -24,7 +23,6 @@ the original data is first loaded into a Spark DataFrame 
using the GraphAr Spark
 Then, the DataFrame is written into generated GraphAr format files through a 
GraphAr Spark Writer,
 following the meta data defined in a new information file.
 
-
 ### Compute with GraphX
 
 Another important use case of GraphAr is to use it as a data source for graph
@@ -33,7 +31,6 @@ a GraphX graph from reading GraphAr format files and 
executing a connected-compo
 Also, executing queries with Spark SQL and running other graph analytic 
algorithms
 can be implemented in a similar fashion.
 
-
 ### Import/Export graphs of Neo4j
 
 [Neo4j](https://neo4j.com/product/neo4j-graph-database) graph database provides
@@ -210,4 +207,4 @@ See [GraphAr2Neo4j.scala][graphar2neo4j] for the complete 
example.
 [transformer-example]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
 [compute-example]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
 [neo4j2graphar]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/Neo4j2GraphAr.scala
-[graphar2neo4j]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
\ No newline at end of file
+[graphar2neo4j]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
diff --git a/docs/libraries/spark/spark.md b/docs/libraries/spark/spark.md
index eaf79b8b..76a5993b 100644
--- a/docs/libraries/spark/spark.md
+++ b/docs/libraries/spark/spark.md
@@ -25,7 +25,6 @@ The GraphAr Spark library can be used in a range of scenarios:
 
 For more information on its usage, please refer to the [Examples](examples.md).
 
-
 ## Get GraphAr Spark Library
 
 ### Building from source
@@ -52,7 +51,6 @@ After compilation, a similar file 
*graphar-x.x.x-SNAPSHOT-shaded.jar* is generat
 
 Please refer to the [building 
steps](https://github.com/apache/incubator-graphar/tree/main/spark) for more 
details.
 
-
 ## How to Use
 
 ### Information classes
@@ -75,7 +73,6 @@ val version = graph_info.getVersion
 
 See [TestGraphInfo.scala][test-graph-info] for the complete example.
 
-
 ### IndexGenerator
 
 The GraphAr file format assigns each vertex with a unique index inside the 
vertex type (which called internal vertex id) starting from 0 and increasing 
continuously for each type of vertex (i.e., with the same vertex label). 
However, the vertex/edge tables in Spark often lack this information, requiring 
special attention. For example, an edge table typically uses the primary key 
(e.g., "id", which is a string) to identify its source and destination vertices.
@@ -106,7 +103,6 @@ val edge_df_src_dst_index = 
IndexGenerator.generateDstIndexForEdgesFromMapping(e
 
 See [TestIndexGenerator.scala][test-index-generator] for the complete example.
 
-
 ### Writer
 
 The GraphAr Spark writer provides the necessary Spark interfaces to write 
DataFrames into GraphAr formatted files in a batch-import fashion. With the 
VertexWriter, users can specify a particular property group to be written into 
its corresponding chunks, or choose to write all property groups. For edge 
chunks, besides the meta data (edge info), the adjList type should also be 
specified. The adjList/properties can be written alone, or alternatively, all 
adjList, properties, and the offset [...]
@@ -145,7 +141,6 @@ writer.writeEdges()
 
 See [TestWriter.scala][test-writer] for the complete example.
 
-
 ### Reader
 
 The GraphAr Spark reader provides an extensive set of interfaces to read 
GraphAr format files. It reads a collection of vertices or edges at a time and 
assembles the result into the Spark DataFrame. Similar with the reader in C++ 
library, it supports the users to specify the data they need, e.g., a single 
property group.
@@ -181,7 +176,6 @@ val edge_df = reader.readEdges()
 
 See [TestReader.scala][test-reader] for the complete example.
 
-
 ### Graph-level APIs
 
 To improve the usability of the GraphAr Spark library, a set of APIs are 
provided to allow users to easily perform operations such as reading, writing, 
and transforming data at the graph level. These APIs are fairly easy to use, 
while the previous methods of using reader, writer and information classes are 
more flexibly and can be highly customized.
@@ -210,8 +204,9 @@ The Graph Transformer can be used for various purposes, 
including transforming G
 :::note
 
 There are certain limitations while using the Graph Transformer:
-  -  The vertices (or edges) of the source and destination graphs are aligned 
by labels, meaning each vertex/edge label included in the destination graph 
must have an equivalent in the source graph, in order for the related chunks to 
be loaded as the data source.
-  -  For each group of vertices/edges (i.e., each single label), each property 
included in the destination graph (defined in the relevant VertexInfo/EdgeInfo) 
must also be present in the source graph.
+
+- The vertices (or edges) of the source and destination graphs are aligned by 
labels, meaning each vertex/edge label included in the destination graph must 
have an equivalent in the source graph, in order for the related chunks to be 
loaded as the data source.
+- For each group of vertices/edges (i.e., each single label), each property 
included in the destination graph (defined in the relevant VertexInfo/EdgeInfo) 
must also be present in the source graph.
 
   In addition, users can use the GraphAr Spark Reader/Writer to conduct data 
transformation more flexibly at the vertex/edge table level, as opposed to the 
graph level. This allows for a more granular approach to transforming data, as 
[TransformExample.scala][transform-example] shows.
 
@@ -241,4 +236,4 @@ The Spark library for GraphAr supports reading and writing 
data from/to cloud st
 [compute-example]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
 [transform-example]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
 [neo4j2graphar]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/Neo4j2GraphAr.scala
-[graphar2neo4j]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
\ No newline at end of file
+[graphar2neo4j]: 
https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
diff --git a/docs/overview/concepts.md b/docs/overview/concepts.md
index 2039e8dc..7a080cb6 100644
--- a/docs/overview/concepts.md
+++ b/docs/overview/concepts.md
@@ -11,12 +11,12 @@ Glossary of relevant concepts and terms.
   group is the unit of storage and is stored in a separate directory.
 
 - **Adjacency List**: The storage method to store the edges of certain vertex 
type. Which include:
-    - *ordered by source vertex id*: the edges are ordered and aligned by the 
source vertex
-    - *ordered by destination vertex id*: the edges are ordered and aligned by 
the destination vertex
-    - *unordered by source vertex id*: the edges are unordered but aligned by 
the source vertex
-    - *unordered by destination vertex id*: the edges are unordered but 
aligned by the destination vertex
+  - *ordered by source vertex id*: the edges are ordered and aligned by the 
source vertex
+  - *ordered by destination vertex id*: the edges are ordered and aligned by 
the destination vertex
+  - *unordered by source vertex id*: the edges are unordered but aligned by 
the source vertex
+  - *unordered by destination vertex id*: the edges are unordered but aligned 
by the destination vertex
 
-- **Compressed Sparse Row (CSR)**: The storage layout the edges of certain 
vertex type. Corresponding to the 
+- **Compressed Sparse Row (CSR)**: The storage layout the edges of certain 
vertex type. Corresponding to the
   ordered by source vertex id adjacency list, the edges are stored in a single 
array and the offsets of the
   edges of each vertex are stored in a separate array.
 
@@ -29,7 +29,7 @@ Glossary of relevant concepts and terms.
   no offsets are stored.
 
 - **Vertex Chunk**: The storage unit of vertex. Each vertex chunk contains a 
fixed number of vertices and is stored
-  in a separate file. 
+  in a separate file.
 
 - **Edge Chunk**: The storage unit of edge. Each edge chunk contains a fixed 
number of edges and is stored in a separate file.
 
diff --git a/docs/overview/motivation.md b/docs/overview/motivation.md
index a262c87b..5551aa89 100644
--- a/docs/overview/motivation.md
+++ b/docs/overview/motivation.md
@@ -4,11 +4,11 @@ title: Motivation
 sidebar_position: 2
 ---
 
-Numerous graph systems, 
-such as Neo4j, Nebula Graph, and Apache HugeGraph, have been developed in 
recent years. 
-Each of these systems has its own graph data storage format, complicating the 
exchange of graph data between different systems. 
+Numerous graph systems,
+such as Neo4j, Nebula Graph, and Apache HugeGraph, have been developed in 
recent years.
+Each of these systems has its own graph data storage format, complicating the 
exchange of graph data between different systems.
 The need for a standard data file format for large-scale graph data storage 
and processing that can be used by diverse existing systems is evident, as it 
would reduce overhead when various systems work together.
 
 Our aim is to fill this gap and contribute to the open-source community by 
providing a standard data file format for graph data storage and exchange, as 
well as for out-of-core querying.
 This format, which we have named GraphAr, is engineered to be efficient, 
cross-language compatible, and to support out-of-core processing scenarios, 
such as those commonly found in data lakes.
-Furthermore, GraphAr's flexible design ensures that it can be easily extended 
to accommodate a broader array of graph data storage and exchange use cases in 
the future.
\ No newline at end of file
+Furthermore, GraphAr's flexible design ensures that it can be easily extended 
to accommodate a broader array of graph data storage and exchange use cases in 
the future.
diff --git a/docs/overview/overview.md b/docs/overview/overview.md
index 637278ba..3ed3ad72 100644
--- a/docs/overview/overview.md
+++ b/docs/overview/overview.md
@@ -13,4 +13,5 @@ It is intended to serve as the standard file format for 
importing/exporting and
 Additionally, it can also serve as the direct data source for graph processing 
applications.
 
 ### [Motivation](/docs/overview/motivation)
+
 ### [Concepts](/docs/overview/concepts)
diff --git a/docs/specification/format.md b/docs/specification/format.md
index b456227f..3859b5f1 100644
--- a/docs/specification/format.md
+++ b/docs/specification/format.md
@@ -5,8 +5,8 @@ sidebar_position: 1
 
 ## Property Graph
 
-GraphAr is designed for representing and storing the property graphs. Graph 
(in discrete mathematics) is a structure made of vertices and edges. 
-Property graph is then a type of graph model where the vertices/edges could 
carry a name (also called as type or label) and some properties. 
+GraphAr is designed for representing and storing the property graphs. Graph 
(in discrete mathematics) is a structure made of vertices and edges.
+Property graph is then a type of graph model where the vertices/edges could 
carry a name (also called as type or label) and some properties.
 Since carrying additional information than non-property graphs, the property 
graph is able to represent
 connections among data scattered across diverse data databases and with 
different schemas.
 Compared with the relational database schema, the property graph excels at 
showing data dependencies.
@@ -16,7 +16,7 @@ network routing, scientific computing and so on.
 A property graph consists of vertices and edges, with each vertex contains a 
unique identifier and:
 
 - A text label that describes the vertex type.
-- A collection of properties, with each property can be represented by a 
key-value pair. 
+- A collection of properties, with each property can be represented by a 
key-value pair.
 
 Each edge contains a unique identifier and:
 
@@ -33,7 +33,7 @@ The following is an example property graph containing two 
types of vertices ("pe
 
 GraphAr support a set of built-in property data types that are common in real 
use cases and supported by most file types (CSV, ORC, Parquet), includes:
 
-- **Boolean** 
+- **Boolean**
 - **Int32**: Integer with 32 bits
 - **Int64**: Integer with 64 bits
 - **Float**: 32-bit floating point values
@@ -45,7 +45,7 @@ GraphAr support a set of built-in property data types that 
are common in real us
 - **List**: A list of values of the same type
 
 GraphAr also supports the user-defined data types, which can be used to 
represent complex data structures,
-such as the struct, map, and union types. 
+such as the struct, map, and union types.
 
 ## Configurations
 
@@ -85,10 +85,9 @@ Adjacency list is a data structure used to represent the 
edges of a graph. Graph
 - **unordered_by_source**: the internal id of the source vertex is used as the 
partition key to divide the edges into different sub-logical-tables, and the 
edges in each sub-logical-table are unordered, which can be seen as the COO 
format.
 - **unordered_by_dest**: the internal id of the destination vertex is used as 
the partition key to divide the edges into different sub-logical-tables, and 
the edges in each sub-logical-table are unordered, which can also be seen as 
the COO format.
 
-
 ## Vertex Chunks in GraphAr
 
-### Logical table of vertices 
+### Logical table of vertices
 
 Each type of vertices (with the same label) constructs a logical vertex table, 
with each vertex assigned with a global index inside this type (called internal 
vertex id) starting from 0, corresponding to the row number of the vertex in 
the logical vertex table. An example layout for a logical table of vertices 
under the label "person" is provided for reference.
 
@@ -102,7 +101,7 @@ In the logical vertex table, some property can be marked as 
the primary key, suc
 
 :::
 
-### Physical table of vertices 
+### Physical table of vertices
 
 The logical vertex table will be partitioned into multiple continuous vertex 
chunks for enhancing the reading/writing efficiency. To maintain the ability of 
random access, the size of vertex chunks for the same label is fixed. To 
support to access required properties avoiding reading all properties from the 
files, and to add properties for vertices without modifying the existing files, 
the columns of the logical table will be divided into several column groups.
 
@@ -116,7 +115,7 @@ For efficiently utilize the filter push-down of the payload 
file format like Par
 
 :::
 
-## Edge Chunks in GraphAr 
+## Edge Chunks in GraphAr
 
 ### Logical table of edges
 
@@ -187,11 +186,11 @@ See also [Gar Information 
Files](https://graphar.apache.org/docs/libraries/cpp/g
 As previously mentioned, each logical vertex/edge table is divided into 
multiple physical tables stored in one of the following file formats:
 
 - [Apache ORC](https://orc.apache.org/)
-- [Apache Parquet](https://parquet.apache.org/) 
+- [Apache Parquet](https://parquet.apache.org/)
 - CSV
 - JSON
 
-Both of Apache ORC and Apache Parquet are column-oriented data storage 
formats. In practice of graph processing, it is common to only query a subset 
of columns of the properties. Thus, the column-oriented formats are more 
efficient, which eliminate the need to read columns that are not relevant. They 
are also used by a large number of data processing frameworks like [Apache 
Spark](https://spark.apache.org/), [Apache Hive](https://hive.apache.org/), 
[Apache Flink](https://flink.apache.org [...]
+Both of Apache ORC and Apache Parquet are column-oriented data storage 
formats. In practice of graph processing, it is common to only query a subset 
of columns of the properties. Thus, the column-oriented formats are more 
efficient, which eliminate the need to read columns that are not relevant. They 
are also used by a large number of data processing frameworks like [Apache 
Spark](https://spark.apache.org/), [Apache Hive](https://hive.apache.org/), 
[Apache Flink](https://flink.apache.org [...]
 
 See also [GraphAr Data 
Files](https://graphar.apache.org/docs/libraries/cpp/getting-started#gar-data-files)
 for an example.
 
diff --git a/docs/specification/implementation-status.md 
b/docs/specification/implementation-status.md
index df19d54d..6e634714 100644
--- a/docs/specification/implementation-status.md
+++ b/docs/specification/implementation-status.md
@@ -60,7 +60,6 @@ Supported compression methods for the file formats:
 
 :::
 
-
 ## Property
 
 | Property feature  | C++   | Java  | Scala |   Python   |
@@ -68,7 +67,6 @@ Supported compression methods for the file formats:
 | primary key       | ✓     | ✓     | ✓     | ✓          |
 | nullable          | ✓     |       | ✓     | ✓          |
 
-
 Supported operations in Property:
 
 | Property operation| C++   | Java  | Scala |   Python   |
@@ -78,13 +76,12 @@ Supported operations in Property:
 | is_primary_key    | ✓     | ✓ (1) | ✓     | ✓ (2)      |
 | is_nullable       | ✓     |       | ✓     | ✓ (2)      |
 
-
 ## Property Group
 
 | Property Group (operation) | C++   |Java (1)| Scala |  Python (2)|
 |-------------------|-------|--------|-------|------------|
 | create            | ✓     | ✓      | ✓     | ✓          |
-| add property      | ✓     | ✓      | ✓     | ✓          | 
+| add property      | ✓     | ✓      | ✓     | ✓          |
 | remove property   |       |        |       |            |
 | get properties    | ✓     | ✓      | ✓     | ✓          |
 | check property    | ✓     | ✓      |       |            |
@@ -92,7 +89,6 @@ Supported operations in Property:
 | get path prefix   | ✓     | ✓      | ✓     | ✓          |
 | check validation  | ✓     |        |       |            |
 
-
 ## Adjacency List
 
 | Adjacency List (type) | C++   | Java  | Scala |   Python   |
@@ -111,7 +107,6 @@ Supported operations in Adjacency List:
 | get path prefix   | ✓     |        | ✓     | ✓          |
 | check validation  | ✓     |        |       |            |
 
-
 ## Vertex
 
 Vertex features:
@@ -125,8 +120,8 @@ Vertex features:
 
 :::note
 
-* *label* is the vertex label, which is a unique identifier for the vertex.
-* *tag* is the vertex tag, which is tag or category for the vertex.
+- *label* is the vertex label, which is a unique identifier for the vertex.
+- *tag* is the vertex tag, which is tag or category for the vertex.
 
 :::
 
@@ -146,7 +141,6 @@ Supported operations in Vertex Info:
 | serialize         | ✓     | ✓      | ✓     | ✓          |
 | deserialize       | ✓     | ✓      | ✓     | ✓          |
 
-
 ## Edge
 
 Edge features:
@@ -190,11 +184,10 @@ Supported operations in Edge Info:
 
 :::
 
-
 ## Graph
 
 | Graph             | C++   | Java  | Scala |   Python   |
-|-------------------|-------|-------|-------|------------| 
+|-------------------|-------|-------|-------|------------|
 | labeled vertex (with property)    | ✓     | ✓     | ✓     | ✓          |
 | labeled edge (with property)     | ✓     | ✓     | ✓     | ✓          |
 | extra info        | ✓     |       |       |            |
@@ -226,7 +219,6 @@ Supported operations in Graph Info:
 
 :::
 
-
 ## Libraries Version Compatibility
 
 | GraphAr C++ Version | C++ | CMake | Format Version |
diff --git a/licenserc.toml b/licenserc.toml
index 56db55c8..5353b95b 100644
--- a/licenserc.toml
+++ b/licenserc.toml
@@ -25,6 +25,7 @@ excludes = [
   # Documents
   "**/*.md",
   "**/*.mdx",
+  "docs/.markdownlint.yaml",
 
   # Meta files
   "NOTICE",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-graphar) branch main updated: feat(CI):enable markdownlint and typos in docs.yml (#508)

Reply via email to