Hello Yi Zhou,

you are right that if every "data point"-vertex sends messages to every
other vertex, this algorithm is too expensive and won't scale.
However, I think that we don't need to consider all data points as
exemplars for every vertex.

Actually, you can see the input graph capturing thesimilarity relationship
between data points, i.e. if 2 vertices are connected, then they serve as
potential exemplars to each other. This way, we only need to send messages
across the existing graph edges, instead of creating a fully connected
graph. This would be equivalent to setting the similarity between vertices
that are not neighbors to a very small value.

You can also take a look at the Affinity Propagation FAQ [1], where the
following is explained:
*"...if you know beforehand that each data point has the potential to be
assigned to only a small fraction of the other data points, then the answer
is "yes". Running affinity propagation on such data is equivalent to
setting the similarity between two data points that shouldn't be linked
together to negative infinity. However, there is a version of affinity
propagation that only exchanges messages between pairs of points when the
similarity is not negative infinity. This algorithm's time and memory
requirements scale linearly with the number of similarities, which would be
NxN if a full set of pairwise similarities is input, but much, much less if
the set of similarities is sparse."*

Of course, we will have to clearly document this behavior in our
documentation and warm the users about the complexity, in case they want to
consider all exemplars.

Let me know what your thoughts are on this!

Cheers,
-Vasia.

[1]: http://www.psi.toronto.edu/affinitypropagation/faq.html

On 24 March 2015 at 11:37, Yi ZHOU <zhouyi0...@hotmail.com> wrote:

> Hello everyone,
>
> I am working on  affinity propagation implementation for Gelly (FLINK 1707
> <https://issues.apache.org/jira/browse/FLINK-1707>).    The algorithm
> passes messages between every pair of vertices (NOT every pair of connected
> vertices) in each iteration with computation complexity (N²*Iter), it has a
> memory complexity of O(N²) also. So I believe  the algorithm will suffer
> from large communication complexity, no matter how we distribute the graph
> into different machines. The simple fact is that the algorithm passing two
> kinds of message on a complement graph. I see some similar discussion in
> SPARK <https://issues.apache.org/jira/browse/SPARK-5832>
>
> I found an adaptive implementation on hadoop, It runs affinity
> appropagation on the  subgraphs , then merges  these clusters into larger
> ones. http://www.ijeee.net/uploadfile/2014/0807/20140807114023665.pdf .
> It is not equal to the original algorithm. So,does any one know another
> distributed version or have any suggestions?
>
>
> ZHOU Yi
>

Reply via email to