SemyonSinchenko commented on issue #679: URL: https://github.com/apache/incubator-graphar/issues/679#issuecomment-3317790469
> I've been working on something similar to this. [Blog post](https://adsharma.github.io/beating-the-CAP-theorem-for-graphs/) with details. > > Since I was not aware of `graph-ar`, I invented [my own](https://github.com/adsharma/graph-std). Proposed syntax: > > ``` > CREATE NODE TABLE Person(ID INT64, name STRING, PRIMARY KEY(ID)) > WITH (storage = '/tmp/karate_random'); > > CREATE REL TABLE knows(FROM Person TO Person, weight DOUBLE) > WITH (storage = '/tmp/karate_random'); > > MATCH (p1:Person)-[r:knows]->(p2:Person) RETURN p1.ID, p2.ID, r.weight; > ``` > > I'm not sure this can be achieved in an extension. So I modified kuzu parser to support this syntax. > > One of the differences vs the proposal is that I use kuzu table catalog to store metadata instead of yaml files. This is similar to Ducklake vs Iceberg. @adsharma Thanks for sharing! I see usecases are slightly different. Your format is simpler (from the integration point of view) and it is well suited for the in-memory graph processing. At the same time, GAR is aiming to provide more scalable approach for out of core processing. Let's imagine we have a property graph with `movies` and `people` and the following edge groups: - people can `like` movies - people can `message` each other - people can `follow` each other - movie can be a part of `series` (prequel, sequel, prev series, next series, etc.) GAR stores properties in a separate groups as well as it splits all of them to chunks. For example, if we want to do something like `MATCH (:people {name: Alice}) -> [likes] -> (:movies)` with GAR, we do not need to: - load to memory anything, except `people`, `like`, `movie` (much less edges in memory) - based on the known ID of `people` start vertex and vertex chunk size, we can find the chunk that contains only this vertex and load to memory only it - based on edges chunk size and known the source `people` ID we can find chunks of `like` edges that starts from this ID (because in GAR edges are pre-sorted before splitting to chunks) and load only this chunk That allows to run queries on really huge graphs stored in parquet files without need to load all the graph to memory. With your format, as I can understand, we would need to: - load at least the full nodes parquet to filter out only `people` properties - load at least all the edges from the source people vertex to filter only `like` But it should not be a problem for in-memory processing if we are loading a lot for fast queries anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org For additional commands, e-mail: commits-h...@graphar.apache.org