Thank you for taking the time to answer :) I don't have a fix Use-Case, but I am trying yo build a POC and evaluate whether Apache Arrow could be adequate in the context of graphs. But I found out very quickly that I won't be able to do all the necessary algorithm steps using only Apache Arrow without resorting to other libraries.
On Fri, Jun 30, 2023, 07:36 Benson Muite <benson_mu...@emailplus.org> wrote: > On 6/30/23 04:21, Bechir Ben Daadouch wrote: > > Dear Apache Arrow Dev Community, > > > > My name is Bechir, I am currently working on a project that involves > > implementing graph algorithms in Apache Arrow. > > > > The initial plan was to construct a node structure and a subsequent graph > > that would encompass all the nodes. However, I quickly realized that due > to > > Apache Arrow's columnar format, this approach was not feasible. > > > > I tried a couple of things, including the implementation of the > > shortest-path algorithm. However, I rapidly discovered that manipulating > > arrow objects, particularly when applying graph algorithms, proved more > > complex than anticipated and it became very clear that I would need to > > resort to some data structures outside of what arrow offers (i.e.: Heapq > > wouldn't be possible using arrow). > > > > I also gave a shot at doing it similar to a certain SQL method (see: > > https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I > > ended up having to resort to using Pandas for some transformations. > > > > My next course of action is to experiment with compressed sparse rows, > > hoping to execute Matrix Multiplication using this method. But honestly, > > with what I know right now, I remain skeptical about the feasibility > > of it. However, > > before committing to this approach, I would greatly appreciate your > opinion > > based on your experience with Apache Arrow. > > > > Thank you very much for your time. > > > > Looking forward to potentially discussing this further. > > > > Many thanks, > > Bechir > > > Arrow may not be the best choice for most graph algorithms as they > typically require random memory accesses that will be difficult to > coalesce into forms that allow for vectorization. If your data will fit > in memory of a single node, you might consider: > https://github.com/DrTimothyAldenDavis/GraphBLAS > https://pypi.org/project/python-graphblas/ > https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl >