adsharma commented on issue #679:
URL: 
https://github.com/apache/incubator-graphar/issues/679#issuecomment-3320914564

   Thank you for explaining! My solution is motivated by trying to serve 
wikidata (90 million nodes, 800+ million edges) from kuzu. The on-disk storage 
requirements were unacceptable due to denormalization. I'm looking to serve 
graphs 10x that size. So on-disk and selective loading is the main use case.
   
   I also want to compete with LLMs in terms of graph compression and storage 
efficiency by offloading some of the knowledge stored there into external 
storage.
   
   Parquet files as they stand now aren't sufficient, but a step in the right 
direction.
   
   I don't want to specify whether the edges should be sorted by type or by 
graph structure. Depends on the use case. Want to support both well.
   
   Kuzu folks have made a decision to support strongly typed nodes and edges. 
But you can always store weakly typed graphs by merging them all into a "node" 
table and a "rel" table.
   
   If the parquet file is sorted, you can do predicate pushdowns. DuckDB and 
Spark do it.
   
   DuckDB native storage is also supported as an additional single file option. 
Why? It has a few more [compression 
tricks](https://duckdb.org/2025/09/08/duckdb-on-the-framework-laptop-13.html#tpc-h-sf10000)
 and single file is more convenient. In the TPC-H SF10k example, parquet files 
were 4TB, but duckdb was 2.7TB.
   
   Kuzu has an extension to read from duckdb. But I'm not sure if it can handle 
TB sized files and do efficient predicate pushdowns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@graphar.apache.org
For additional commands, e-mail: commits-h...@graphar.apache.org

Reply via email to