gsvgit opened a new issue, #13545:
URL: https://github.com/apache/datafusion/issues/13545

   ### Is your feature request related to a problem or challenge?
   
   SQL (standard) was recently extended with property graph querying features 
(PGQ): [ISO standard](https://www.iso.org/standard/79473.html), [theoretical 
foundations](https://arxiv.org/abs/2409.01102). I wonder if DataFusion can be 
extended with PGQ.  
   
   ### Describe the solution you'd like
   
   All parts should be extended. The most nontrivial part is interconnection 
between traditional SQL and graph analysis (path-related evaluations). While it 
is possible to store graph in columnar storage (e.g. Apache Arrow), it may be 
inefficient for path-related queries (while pretty efficient for 
attributes-of-vertex-related analytical queries). So, specific path-indexes may 
be required. Even more, in some cases it may be good idea to store graph 
topology in separated storage in specific format (e.g. sparse adjacency matrix, 
similar to 
[FalkorDB](https://docs.falkordb.com/design/#the-theory-ideas-behind-falkordb)).
 
   On the other hand, even if we store graph in columnar storage, 
linear-algebra-related primitives can be useful for path querying ([DuckPGQ: 
Efficient Property Graph Queries in an analytical 
RDBMS](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.cidrdb.org/cidr2023/papers/p66-wolde.pdf&ved=2ahUKEwiw3On90PSJAxVpIhAIHZLeCnQQFnoECBcQAQ&usg=AOvVaw3a0YNXq5JLEWEJ4lNRAKdk)).
   
   So, logical and physical plans should provide not only specific operators, 
but support balancing between data representation. 
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   Can something like [this 
project](https://github.com/code-sam/graphblas_sparse_linear_algebra) be used 
for physical level of linear algebra?
   
   [Possible theoretical 
foundations](https://www.irif.fr/~rogova/thesis_Alexandra_Rogova.pdf).
   
   It may be a first step to support 
[GQL](https://www.iso.org/standard/76120.html).
   
   I'm interested in such a system design and development, but I'm aware that 
such an extension of DataFusion may leads to system recreation. So, I want to 
discuss this direction: should we extend DataFusion or create new independent 
system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to