Hi, @Adam, thank you :)
@Aldrin: First, I tried to implement the basic data structures needed for graph algorithms, namely a queue and a heapq (for breadth first search and dijkstra (weighted shortest path)). That's when I had some problems since Apache Arrow objects are immutable. This is how I implemented the queue: https://replit.com/@BechirBen2/ArrowQueue#arrow_array_queue.py I used the concat_arrays method in the enqueue method, basically creating a new array every time I add an object to the queue. This is how I implemented breadth first search: https://replit.com/@BechirBen2/Graph-Apache-Arrow#arrow_breadh_first_search.py As you can see I had to resort to converting the objects to python objects and using python Set. for the Heapq, I haven't tried since I'll probably be doing some back and forth between Arrow objects and Python types, I'd be better off using the Heapq module from Python directly. Since I can't do operations elementwise, I saw no use in using adjacency lists. Here is a screenshot on how I represented the data: https://ibb.co/pxyFpb6 I basically saved the nodes as Int64 and the neighboring nodes in a List. For the sql approach I did something similar to this: https://imgbb.com/0rPGB42 I basically use the pyarrow.compute module: 1- Filter the neighbors of the start Node: https://ibb.co/2gMDHnX 2- use pc.list_flatten and then check if the target node is in the neighboring nodes using pc.is_in(flat_neighbors, target_node) 3- If not, I convert the pyarrow table to pandas, then I explode the neighboring_nodes, and save the start_node in a new column called source_path: https://ibb.co/60Vj0RX and I continue like this until we reach the target or we reach the last node(s) without neighbors @Gavin: Thank you for taking the time to give a code example. I already did ^^ On Fri, Jun 30, 2023 at 8:55 PM Gavin Ray <ray.gavi...@gmail.com> wrote: > This isn't particularly efficient, but could you do something like this? > > https://replit.com/@GavinRay97/EnlightenedRichAdministration#main.py > > On Fri, Jun 30, 2023 at 1:10 PM Aldrin <octalene....@pm.me.invalid> wrote: > > > > But I found out very quickly that I won't be able to... using only > > Apache Arrow without resorting to other libraries. > > > > > I am aiming to assess the viability of Apache Arrow for graph > algorithms > > and data structures... > > > > > I also gave a shot at doing it similar to a certain SQL method... > > > > I'm curious about these portions of what you've said. > > > > Could you share what you have tried and what roadblocks you're hitting? > > Are you struggling with mutability? How are you representing your data? > You > > mention heapq, but it's not clear if you're using an adjacency matrix or > > adjacency lists or if you're using a more normalized relational format. > > > > Thanks! > > > > > > # ------------------------------ > > > > # Aldrin > > > > > > https://github.com/drin/ > > > > https://gitlab.com/octalene > > > > https://keybase.io/octalene >