This is an automated email from the ASF dual-hosted git repository.

lixueclaire pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git


The following commit(s) were added to refs/heads/main by this push:
     new b0d4f1bf feat(docs): how to process graph data with labels (#643)
b0d4f1bf is described below

commit b0d4f1bf46381015c17df6c0747dcc3282529431
Author: Elssky <[email protected]>
AuthorDate: Mon Oct 28 11:13:42 2024 +0800

    feat(docs): how to process graph data with labels (#643)
---
 docs/libraries/cpp/getting-started.md | 99 +++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/docs/libraries/cpp/getting-started.md 
b/docs/libraries/cpp/getting-started.md
index cf93f75b..c2c433fd 100644
--- a/docs/libraries/cpp/getting-started.md
+++ b/docs/libraries/cpp/getting-started.md
@@ -277,6 +277,105 @@ is used to write the results to new generated data chunks.
 Please refer to [more examples](examples/out-of-core.md) to learn
 about the other available case studies utilizing GraphAr.
 
+### Processing Graph Data with Labels
+
+As GraphAr supports LPG data, users can add labels for vertices and use 
related label filtering functions to obtain specified vertices.
+The standard csv format of graph data with labels supported by GraphAr is 
`id|:LABEL|property_1|property_2`, if a vertex has multiple labels, use ; to 
separate them. Here is an example.
+
+```csv
+id|:LABEL|name|url
+0|company;public|Kam_Air|http://dbpedia.org/resource/Kam_Air
+1|company|Balkh_Airlines|http://dbpedia.org/resource/Balkh_Airlines
+2|company|Khyber_Afghan_Airlines|http://dbpedia.org/resource/Khyber_Afghan_Airlines
+
+```
+
+When you have the data ready, you can read the file into `arrow::Table` by 
using arrow IO function.
+
+``` cpp
+    arrow::csv::ReadOptions read_options{}; 
+    arrow::csv::ParseOptions parse_options{}; 
+    arrow::csv::ConvertOptions convert_options{};
+
+    parse_options.delimiter = '|'; 
+
+    auto input = arrow::io::ReadableFile::Open(test_data_dir + 
"/ldbc/organisation_0_0.csv", arrow::default_memory_pool()).ValueOrDie();
+
+    auto reader = arrow::csv::TableReader::Make(
+        arrow::io::default_io_context(),
+        input,
+        read_options,
+        parse_options,
+        convert_options).ValueOrDie();
+
+    std::shared_ptr<arrow::Table> table;
+    table = reader->Read().ValueOrDie();
+```
+You can export label table to disk in parquet format, and read it back into 
memory in the following way.
+``` cpp
+  // write arrow table as parquet chunk
+  auto maybe_writer =
+      VertexPropertyWriter::Make(vertex_info, test_data_dir + 
"/ldbc/parquet/");
+  REQUIRE(!maybe_writer.has_error());
+  auto writer = maybe_writer.value();
+  REQUIRE(writer->WriteTable(table, 0).ok());
+  REQUIRE(writer->WriteVerticesNum(table->num_rows()).ok());
+
+  // read parquet chunk as arrow table
+  auto maybe_reader =
+      VertexPropertyArrowChunkReader::Make(graph_info, "organisation", labels);
+  assert(maybe_reader.status().ok());
+  auto reader = maybe_reader.value();
+  assert(reader->seek(0).ok());
+  assert(reader->GetLabelChunk().status().ok());
+  assert(reader->next_chunk().ok());
+```
+### Using Label Filtering Functions
+
+By calling the `graphar::VerticesCollection::verticesWithLabel` or 
`graphar::VerticesCollection::verticesWithMultipleLabels` API, we can specify a 
certain type of vertices on a certain graph, then filter out all vertices that 
match one or more labels. Here we introduce several examples of using label 
filtering.
+
+
+
+```cpp
+  graph_info = ...
+  auto vertex_info = graph_info->GetVertexInfo("organisation");
+  auto labels = vertex_info->GetLabels();
+
+  // query vertices with a specific label
+  auto maybe_filter_vertices_collection =
+      graphar::VerticesCollection::verticesWithLabel(std::string("company"), 
graph_info, type);
+  ASSERT(!maybe_filter_vertices_collection.has_error());
+  auto filter_vertices = maybe_filter_vertices_collection.value();
+  
+  // iterate vertices with label "company"
+  for (auto it = filter_vertices->begin(); it != filter_vertices->end();
+       ++it) {
+    // get a node's all labels
+    auto label_result = it.label();
+    std::cout << "id: " << it.id() << " ";
+    if (!label_result.has_error()) {
+      for (auto label : label_result.value()) {
+        std::cout << label << " ";
+      }
+    }
+    // ...
+  }
+
+  // query vertices based on a query result
+  auto maybe_filter_vertices_collection =
+      graphar::VerticesCollection::verticesWithLabel(std::string("public"),
+                                                     filter_vertices);
+
+  // query vertices with multi labels
+  auto maybe_filter_vertices_collection =
+    graphar::VerticesCollection::verticesWithMultipleLabels({"company", 
"public"}, graph_info, type);
+  // ...
+
+ 
+```
+Notice that, if the first two queries are executed successively, the result is 
equivalent to the third query.
+
+
 ### Working with Cloud Storage (S3, OSS)
 
 GraphAr supports reading and writing data from and to cloud storage, including


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to