Thank you for taking the time to answer :)

I don't have a fix Use-Case, but I am trying yo build a POC and evaluate
whether Apache Arrow could be adequate in the context of graphs. But I
found out very quickly that I won't be able to do all the necessary
algorithm steps using only Apache Arrow without resorting to other
libraries.

On Fri, Jun 30, 2023, 07:36 Benson Muite <benson_mu...@emailplus.org> wrote:

> On 6/30/23 04:21, Bechir Ben Daadouch wrote:
> > Dear Apache Arrow Dev Community,
> >
> > My name is Bechir, I am currently working on a project that involves
> > implementing graph algorithms in Apache Arrow.
> >
> > The initial plan was to construct a node structure and a subsequent graph
> > that would encompass all the nodes. However, I quickly realized that due
> to
> > Apache Arrow's columnar format, this approach was not feasible.
> >
> > I tried a couple of things, including the implementation of the
> > shortest-path algorithm. However, I rapidly discovered that manipulating
> > arrow objects, particularly when applying graph algorithms, proved more
> > complex than anticipated and it became very clear that I would need to
> > resort to some data structures outside of what arrow offers (i.e.: Heapq
> > wouldn't be possible using arrow).
> >
> > I also gave a shot at doing it similar to a certain SQL method (see:
> > https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I
> > ended up having to resort to using Pandas for some transformations.
> >
> > My next course of action is to experiment with compressed sparse rows,
> > hoping to execute Matrix Multiplication using this method. But honestly,
> > with what I know right now, I remain skeptical about the feasibility
> > of it. However,
> > before committing to this approach, I would greatly appreciate your
> opinion
> > based on your experience with Apache Arrow.
> >
> > Thank you very much for your time.
> >
> > Looking forward to potentially discussing this further.
> >
> > Many thanks,
> > Bechir
> >
> Arrow may not be the best choice for most graph algorithms as they
> typically require random memory accesses that will be difficult to
> coalesce into forms that allow for vectorization. If your data will fit
> in memory of a single node, you might consider:
> https://github.com/DrTimothyAldenDavis/GraphBLAS
> https://pypi.org/project/python-graphblas/
> https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl
>

Reply via email to