csulzy opened a new issue, #828:
URL: https://github.com/apache/incubator-graphar/issues/828

   ### Describe the enhancement requested
   
   Hello GraphAr community,
   
   I would like to propose the creation of a native Go SDK for GraphAr and am 
willing to kickstart the implementation and maintain it.
   
   Motivation
   
   Go (Golang) is a dominant language in the cloud-native landscape, widely 
used for building high-performance backend services, data pipelines, and 
infrastructure tooling. Currently, for a Go application to interact with 
GraphAr data, it would need to rely on complex solutions like CGO bindings to 
the C++ library or inter-process communication with a Java/Spark service.
   
   A native Go SDK would significantly lower the barrier to adoption for the 
vast Go ecosystem by providing an idiomatic, efficient, and dependency-free way 
to read and write GraphAr formatted data. This would enable direct integration 
with Go-based graph databases, analysis tools, and data processing frameworks.
   
   Preliminary Design Proposal (Open for Feedback)
   
   To ensure consistency and maintainability, the Go SDK's design will be 
heavily inspired by the architecture of the existing C++ and 
Java/Spark/Python/Rust libraries. The core idea is to follow the same layered 
approach:
   
   info Package: Pure Go data structures that represent the GraphAr schema 
(GraphInfo, VertexInfo, EdgeInfo, PropertyGroup, etc.) and logic for 
parsing/serializing the .info.yml files. This will leverage the existing Proto 
definitions introduced in PR #573.
   
   storage Package: An abstraction layer for accessing the underlying storage 
(e.g., local filesystem, S3), making the SDK storage-agnostic.
   
   parquet Package: A dedicated module for handling Parquet file I/O, as it's 
the most common payload format. This will leverage a robust Go Parquet library 
(e.g., parquet-go or arrow/go).
   
   reader Package: Provides high-level APIs to read vertex/edge chunks, 
handling different AdjList types and navigating through property groups.
   
   writer Package: Provides high-level APIs to write vertex/edge data into the 
correct chunked and partitioned directory structure, including generating 
offset and metadata files.
   
   Phased Development Plan
   
   I propose tackling this in a phased approach to deliver value incrementally 
and gather feedback:
   
   Phase 1: Core Schema & Reader (Read-Only)
   
   - [ ] Implement the info package for YAML parsing and validation (based on 
generated Protos).
   
   - [ ] Implement the storage layer with an initial local filesystem backend.
   
   - [ ] Implement the parquet reader logic.
   
   - [ ] Implement the reader package to read existing GraphAr vertex and edge 
data (all AdjList types).
   
   - [ ] Add comprehensive unit tests using sample data generated by the Spark 
library to ensure compatibility.
   
   Phase 2: Writer Implementation (Read-Write)
   
   - [ ] Implement the writer package to generate valid GraphAr directory 
structures and metadata.
   
   - [ ] Implement logic to write vertex and edge property chunks to Parquet 
files.
   
   - [ ] Implement logic for writing adjlist and offset chunks.
   
   - [ ] Add round-trip tests (write with Go SDK, read with Go/Spark SDK).
   
   Phase 3: Advanced Features & Optimization
   
   - [ ] Add support for other storage backends (e.g., S3).
   
   - [ ] Performance profiling and optimization.
   
   - [ ] Add high-level graph traversal APIs (optional, based on community 
needs).
   
   Next Steps
   
   I am excited about the possibility of bringing GraphAr to the Go community.
   
   As I am new to this codebase, I would greatly appreciate any guidance from 
the mentors regarding the directory structure, CI integration, or any specific 
requirements I should be aware of.
   
   If the community gives the green light on this plan, I am ready to start 
working on Phase 1.
   
   Thanks!
   
   ### Component(s)
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to