Hi all. Anyone can provide me with some insights here ? I know it's quite an open question here, and it might take some effort of course. Would anyone be available/willing to do an actual code audit of the code that I have ? This would be compensated of course. Feel free to contact me to discuss.
Kind regards On Tue, Jun 15, 2021 at 8:45 PM Mathias Versichele < mathias.versich...@gmail.com> wrote: > Hi, I've been using graph-tool for the last year or so for calculating > shortest-path trees on large-scale road networks. We used to do this in a > postgres database with the pgrouting extension, but were continually > confronted with unacceptable startup costs. The switch to a python module > using graph-tool has considerably sped up our routing queries, but we are > suffering from this service using too much memory. I have the feeling I > might be using graph-tool in a wrong way, but before I dive into that, it > would be good to know what is the expected memory footprint for my use case. > > Take for example a road network with 30Mio edges and 31 Mio nodes (the > combined road network of Belgium, Netherland, France and Germany in OSM). > For this road network, I need to calculate shortest paths using different > edge weights (edge property map). What would be a very rough estimate how > much memory this would use ? For the network only + per edge-property-map. > In our setup, there would be one edge-property-map with edge weights per > country. We're currently seeing usage of over 50Gb easily, spiking even > higher when we're loading extra cost structures or networks. Is that > expected ? Or am I experiencing memory leaks somewhere ? > > How I'm using graph-tool right now: > > *1) loading network* > *nw = dataframe with edges info in the structure: startnode-id, > endnode-id, edge-id, country* > > G = gt.Graph(directed=True) > G.ep["edge_id"] = G.new_edge_property("int") > G.ep["country_id"] = G.new_edge_property("int16_t") > eprops = [G.ep["edge_id"], G.ep["country_id"]] > > n = G.add_edge_list(nw.to_numpy(), hashed=True, eprops=eprops) > G.vertex_properties["n"] = n > > *2) loading edge costs: I'm using GraphViews* > > *countries = list of country-codes* > edge_filter = np.in1d(G.ep["country_id"].a, [get_country_id(c) for c in > countries]) > GV = gt.GraphView(G, efilt=edge_filter) > > edges = GV.get_edges([GV.edge_index]) > sources = G.vertex_properties["n"].a[edges[:,0]] > targets = G.vertex_properties["n"].a[edges[:,1]] > idxs = edges[:,2] > > > *db_costs = pandas dataframe with structure: source, target, cost* > > sti = np.vstack((idxs,sources,targets)).T > sti_df = pd.DataFrame({'idx': sti[:, 0], 'source': sti[:, 1], 'target': > sti[:, 2]}) > m = pd.merge(sti_df, db_costs, on=['source', 'target'], how='left', > sort=False)[['idx', 'c']] > wgts_list = m.sort_values(by=['idx']).T.iloc[1].to_numpy() > wgts_list = np.where(wgts_list==np.nan, np.iinfo(np.int32).max, wgts_list) > > wgts = GV.new_edge_property("int32_t") > wgts.fa = wgts_list > wgts.fa = np.where(wgts.fa==-2147483648, np.iinfo(np.int32).max, wgts.fa) > GV.edge_properties[cs_ids_str] = wgts > > GV2 = gt.GraphView(GV, efilt=wgts.fa != np.inf) > > *3) I then use GV2 for calculating Dijkstra and such...* > > > I could of course work on an MWE of some sorts. But would be very nice to > get an estimate on mem footprint, and to see if I'm doing sth really silly > in the code above. > > Thx! > > > > > >
_______________________________________________ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de