Thanks Tamas! I had no idea that pickling was less efficient than a string
format -- I'd assumed the efficiencies of storing in binary trumped the
issues with references. Will give that a shot.

On Tue, Apr 18, 2017 at 1:07 PM Tamas Nepusz <[email protected]> wrote:

> Hi,
>
> This could be a bug in the pickle implementation (not in igraph, but in
> Python itself):
>
>
> https://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb
> https://bugs.python.org/issue24658
>
> The workaround is to pickle the object into a string, and then write that
> string in chunks less than 2^31 bytes into a file.
>
> However, note that pickling is not a terribly efficient format -- since it
> needs to support serializing an arbitrary set of Python objects that may
> link to each other and form cycles in any conceivable configuration, it has
> to do a lot of extra bookkeeping so that object cycles and objects embedded
> within themselves do not trip up the implementation. That's why the memory
> usage rockets up to 35 GB during pickling. If you only have a name and an
> additional attribute for each vertex, you could potentially gain some speed
> (and cut down on the memory usage) if you brew your custom format -- for
> instance, you could get the edge list and the two vertex attributes, stuff
> them into a Python dict, and then save the dict in JSON format:
>
> def graph_as_json(graph):
>     return {
>         "vertices": {
>             "name": graph.vs["name"],
>             "pt": graph.vs["pt"]
>         },
>         "edges": graph.get_edgelist()
>     }
>
> with open("output.json", "w") as fp:
>     json.dump(graph_as_json(graph), fp)
>
> You could also use gzip.open() instead of open() to compress the saved
> data on-the-fly. You'll also need a json_as_graph() function to perform the
> conversion in the opposite direction.
>
>
> T.
>
> On Tue, Apr 18, 2017 at 9:25 PM, Nick Eubank <[email protected]> wrote:
>
>> Hello all,
>>
>> I'm trying to pickle a very large graph (23 million vertices, 152 million
>> edges, two vertex attributes), but keep getting an `OSError: [Errno 22]
>> Invalid argument` error. However, I think that's erroneous, as if I
>> subsample the graph and save with exact same code I have no problems.
>> Here's the traceback:
>>
>>
>> g.summary()
>> Out[8]: 'IGRAPH UN-- 23331862 152099394 -- \n+ attr: name (v), pt (v)'
>>
>> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
>> voz))
>> Traceback (most recent call last):
>>
>>  File "<ipython-input-9-6b5409a79251>", line 1, in <module>
>>
>>  
>> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
>> voz))
>>
>>  File
>> "/Users/Nick/anaconda/lib/python3.5/site-packages/igraph/__init__.py", line
>> 1778, in write_pickle
>>    result=pickle.dump(self, fname, version)
>>
>> OSError: [Errno 22] Invalid argument
>>
>>
>> g=g.vs[range(3331862)].subgraph()
>>
>> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
>> voz))
>>
>>     [success]
>>
>> The graph takes up about 10gb in memory, and the pickle command expands
>> Python's memory footprint to about 35gb before the exception gets thrown,
>> but I'm on a machine with 80gb ram, so that's not the constraint.
>>
>> Any suggestions as to what might be going on / is there a work around for
>> saving?
>>
>> Thanks!
>>
>> Nick
>>
>> _______________________________________________
>> igraph-help mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>>
> _______________________________________________
> igraph-help mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help

Reply via email to