Thanks Tamas! I had no idea that pickling was less efficient than a string format -- I'd assumed the efficiencies of storing in binary trumped the issues with references. Will give that a shot.
On Tue, Apr 18, 2017 at 1:07 PM Tamas Nepusz <[email protected]> wrote: > Hi, > > This could be a bug in the pickle implementation (not in igraph, but in > Python itself): > > > https://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb > https://bugs.python.org/issue24658 > > The workaround is to pickle the object into a string, and then write that > string in chunks less than 2^31 bytes into a file. > > However, note that pickling is not a terribly efficient format -- since it > needs to support serializing an arbitrary set of Python objects that may > link to each other and form cycles in any conceivable configuration, it has > to do a lot of extra bookkeeping so that object cycles and objects embedded > within themselves do not trip up the implementation. That's why the memory > usage rockets up to 35 GB during pickling. If you only have a name and an > additional attribute for each vertex, you could potentially gain some speed > (and cut down on the memory usage) if you brew your custom format -- for > instance, you could get the edge list and the two vertex attributes, stuff > them into a Python dict, and then save the dict in JSON format: > > def graph_as_json(graph): > return { > "vertices": { > "name": graph.vs["name"], > "pt": graph.vs["pt"] > }, > "edges": graph.get_edgelist() > } > > with open("output.json", "w") as fp: > json.dump(graph_as_json(graph), fp) > > You could also use gzip.open() instead of open() to compress the saved > data on-the-fly. You'll also need a json_as_graph() function to perform the > conversion in the opposite direction. > > > T. > > On Tue, Apr 18, 2017 at 9:25 PM, Nick Eubank <[email protected]> wrote: > >> Hello all, >> >> I'm trying to pickle a very large graph (23 million vertices, 152 million >> edges, two vertex attributes), but keep getting an `OSError: [Errno 22] >> Invalid argument` error. However, I think that's erroneous, as if I >> subsample the graph and save with exact same code I have no problems. >> Here's the traceback: >> >> >> g.summary() >> Out[8]: 'IGRAPH UN-- 23331862 152099394 -- \n+ attr: name (v), pt (v)' >> >> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, >> voz)) >> Traceback (most recent call last): >> >> File "<ipython-input-9-6b5409a79251>", line 1, in <module> >> >> >> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, >> voz)) >> >> File >> "/Users/Nick/anaconda/lib/python3.5/site-packages/igraph/__init__.py", line >> 1778, in write_pickle >> result=pickle.dump(self, fname, version) >> >> OSError: [Errno 22] Invalid argument >> >> >> g=g.vs[range(3331862)].subgraph() >> >> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, >> voz)) >> >> [success] >> >> The graph takes up about 10gb in memory, and the pickle command expands >> Python's memory footprint to about 35gb before the exception gets thrown, >> but I'm on a machine with 80gb ram, so that's not the constraint. >> >> Any suggestions as to what might be going on / is there a work around for >> saving? >> >> Thanks! >> >> Nick >> >> _______________________________________________ >> igraph-help mailing list >> [email protected] >> https://lists.nongnu.org/mailman/listinfo/igraph-help >> >> > _______________________________________________ > igraph-help mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/igraph-help >
_______________________________________________ igraph-help mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/igraph-help
