To be comprehensive, I add here the MWE source. Note that I fixed the
vertices so that the output is reproducible. However, one could select
randomly the vertices and would end with the same behavior.

Bests,
François.



import multiprocessing
import graph_tool as gt
import graph_tool.topology as gtt
import hashlib
import sys

class MyProcess(multiprocessing.Process):
    """
    A process that computes shortest paths and shortest distances in a
graph tool graph.
    """
    def __init__(self, graph, test):
        super(MyProcess, self).__init__()
        self.graph = graph
        self.test = test

    def run(self):
        while True:
            # Operation is repeated so that the bug is cristal clear.
            source, target = self.test
            source = self.graph.vertex(source)
            target = self.graph.vertex(target)

            # We start the work.
            print('{} does shortest_distance from {} to {}'.format(self,
source, target))

            gtt.shortest_distance(self.graph,
                                  source=source,
                                  weights=self.graph.ep['weight'],
                                  max_dist=1400,
                                  pred_map=True)

            # We end the work.
            print('{} done.'.format(self))


def hash_graphs(*args):
    """
    Provides an edge based graph digest that can be used to invalidate old
cache.

    :type args: tuple of :class:`graph_tool.GraphView`
    :param args: the graphs to be hashed.

    :rtype: str
    :return: a hash digest of the input graph.
    """
    graph_hash = hashlib.md5()
    for graph in args:
        graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'],
"source").a.tobytes())
        graph_hash.update(gt.edge_endpoint_property(graph, graph.vp['id'],
"target").a.tobytes())
    return graph_hash.hexdigest()


if __name__ == '__main__':

    # Unserialize the graph.
    graph = gt.load_graph('./mwe/graph.gt.gz')

    # Bug switch.
    if sys.argv[-1] == 'DO_HASH':
        graph_hash = hash_graphs(graph)

    # Repetable inputs.
    tests = [(452946, 391015),
             (266188, 207342),
             (514127, 290838),
             (439705, 87897),
             (223098, 440593),
             (279880, 368550),
             (108357, 199593),
             (273888, 275937)]

    # Actual work.
    procs = [MyProcess(graph, tests[i]) for i in range(8)]

    for proc in procs:
        proc.start()

    for proc in procs:
        proc.join()


On Thu, Nov 10, 2016 at 7:24 PM, François Kawala <[email protected]>
wrote:

> Hello,
>
> I observe a quite strange bug that involves python's multiprocessing
> library. I try to use (read only) one graph instance with several
> *Multithreading.Process*. The graph is unserialized in the parent
> process. Each child receives a reference to the graph. Then each child does
> simple repetitive calls to *graph_tool.topology.shortest_distance*.
> Everything great each child process works as fast as it can. However when
> the main process executes the *hash_graphs* function presented below,
> each child process hangs infinitely. The *hash_graphs* is executed prior
> to the children start.
>
> def hash_graphs(*args):
>     """
>     Provides an edge based graph digest that can be used to invalidate old
> cache.
>
>     :type args: tuple of :class:`graph_tool.GraphView`
>     :param args: the graphs to be hashed.
>
>     :rtype: str
>     :return: a hash digest of the input graph.
>     """
>     graph_hash = hashlib.md5()
>     for graph in args:
>         graph_hash.update(gt.edge_endpoint_property(graph,
> graph.vp['id'], "source").a.tobytes())
>         graph_hash.update(gt.edge_endpoint_property(graph,
> graph.vp['id'], "target").a.tobytes())
>     return graph_hash.hexdigest()
>
> I package a MWE, it is available here : https://drive.google.com/file/d/
> 0B5GhhBKHOKOxVnpfYTBwNDZxODA/view?usp=sharing. To run it simply do :
>
> tar xzf mwe.tar.gz
>
> # run the buggy version
> python3 -m mwe DO_HASH
>
> # run as expected
> python3 -m mwe
>
>
> The buggy output looks like :
>
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to
> 207342
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to
> 87897
> <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to
> 440593
> <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to
> 368550
> <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to
> 199593
> <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to
> 275937
>
>
> The expected output looks like :
>
>
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to
> 207342
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to
> 440593
> <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to
> 368550
> <MyProcess(MyProcess-1, started)> done.
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-2, started)> done.
> <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to
> 207342
> <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to
> 87897
> <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to
> 199593
> <MyProcess(MyProcess-3, started)> done.
> <MyProcess(MyProcess-1, started)> done.
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to
> 275937
> <MyProcess(MyProcess-2, started)> done.
> <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to
> 207342
> <MyProcess(MyProcess-3, started)> done.
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> <MyProcess(MyProcess-1, started)> done.
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-6, started)> done.
> <MyProcess(MyProcess-6, started)> does shortest_distance from 279880 to
> 368550
> <MyProcess(MyProcess-4, started)> done.
> <MyProcess(MyProcess-4, started)> does shortest_distance from 439705 to
> 87897
> <MyProcess(MyProcess-8, started)> done.
> <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to
> 275937
> <MyProcess(MyProcess-1, started)> done.
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-2, started)> done.
> <MyProcess(MyProcess-2, started)> does shortest_distance from 266188 to
> 207342
> <MyProcess(MyProcess-3, started)> done.
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> <MyProcess(MyProcess-5, started)> done.
> <MyProcess(MyProcess-5, started)> does shortest_distance from 223098 to
> 440593
> <MyProcess(MyProcess-1, started)> done.
> <MyProcess(MyProcess-1, started)> does shortest_distance from 452946 to
> 391015
> <MyProcess(MyProcess-8, started)> done.
> <MyProcess(MyProcess-8, started)> does shortest_distance from 273888 to
> 275937
> <MyProcess(MyProcess-7, started)> done.
> <MyProcess(MyProcess-7, started)> does shortest_distance from 108357 to
> 199593
> <MyProcess(MyProcess-3, started)> done.
> <MyProcess(MyProcess-3, started)> does shortest_distance from 514127 to
> 290838
> ...
>
>
> How could I explain this behavior ?
>
> Bests,
> François.
>
>


-- 
François Kawala
_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to