Re: [I] [OSPP] GraphAr Extension for Kuzu [incubator-graphar]

via GitHub Mon, 22 Sep 2025 04:55:21 -0700


SemyonSinchenko commented on issue #679:
URL: 
https://github.com/apache/incubator-graphar/issues/679#issuecomment-3317790469


   > I've been working on something similar to this. [Blog 
post](https://adsharma.github.io/beating-the-CAP-theorem-for-graphs/) with 
details.
   > 
   > Since I was not aware of `graph-ar`, I invented [my 
own](https://github.com/adsharma/graph-std). Proposed syntax:
   > 
   > ```
   > CREATE NODE TABLE Person(ID INT64, name STRING, PRIMARY KEY(ID))
   > WITH (storage = '/tmp/karate_random');
   > 
   > CREATE REL TABLE knows(FROM Person TO Person, weight DOUBLE)
   > WITH (storage = '/tmp/karate_random');
   > 
   > MATCH (p1:Person)-[r:knows]->(p2:Person) RETURN p1.ID, p2.ID, r.weight;
   > ```
   > 
   > I'm not sure this can be achieved in an extension. So I modified kuzu 
parser to support this syntax.
   > 
   > One of the differences vs the proposal is that I use kuzu table catalog to 
store metadata instead of yaml files. This is similar to Ducklake vs Iceberg.
   
   @adsharma Thanks for sharing! I see usecases are slightly different. Your 
format is simpler (from the integration point of view) and it is well suited 
for the in-memory graph processing. At the same time, GAR is aiming to provide 
more scalable approach for out of core processing.
   
   Let's imagine we have a property graph with `movies` and `people` and the 
following edge groups:
   - people can `like` movies
   - people can `message` each other
   - people can `follow` each other
   - movie can be a part of `series` (prequel, sequel, prev series, next 
series, etc.)
   
   GAR stores properties in a separate groups as well as it splits all of them 
to chunks. For example, if we want to do something like `MATCH (:people {name: 
Alice}) -> [likes] -> (:movies)` with GAR, we do not need to:
   - load to memory anything, except `people`, `like`, `movie` (much less edges 
in memory)
   - based on the known ID of `people` start vertex and vertex chunk size, we 
can find the chunk that contains only this vertex and load to memory only it
   - based on edges chunk size and known the source `people` ID we can find 
chunks of `like` edges that starts from this ID (because in GAR edges are 
pre-sorted before splitting to chunks) and load only this chunk
   
   That allows to run queries on really huge graphs stored in parquet files 
without need to load all the graph to memory.
   
   With your format, as I can understand, we would need to:
   - load at least the full nodes parquet to filter out only `people` properties
   - load at least all the edges from the source people vertex to filter only 
`like`
   
   But it should not be a problem for in-memory processing if we are loading a 
lot for fast queries anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org
For additional commands, e-mail: commits-h...@graphar.apache.org

Re: [I] [OSPP] GraphAr Extension for Kuzu [incubator-graphar]

Reply via email to