Re: [R] Clustering algorithms don't find obvious clusters

Dave Roberts Sat, 12 Jun 2010 13:58:04 -0700

Henrik,

Given your initial matrix, that should tell you which authors aresimilar/dissimilar to which other authors in terms of which authors theycite. In this case authors 1 and 3 are most similar because they bothcite authors 2 and 4. Authors 2 and 3 are most different because theyboth cite 6 authors but none of the same authors(sqrt(6^2+5^2+1^2)=7.87). 1 and 2 are next most different because 1only cites 5 authors but shares none with 2 (sqrt(6^2+4^2+1^2)=7.28) etc.

If you want to know which authors are similar in terms of who gascited them, simply transpose the matrix


daisy(t(M))

I'm guessing none of this is actually what you are looking forhowever, and Etienne's graph theoretic approach may be more what youhave in mind.


Dave
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts                                     office 406-994-4548
Department of Ecology                         email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460


Henrik Aldberg wrote:

Dave,

I used daisy with the default settings (daisy(M) where M is the matrix).


Henrik

On 11 June 2010 21:57, Dave Roberts <dvr...@ecology.msu.montana.edu<mailto:dvr...@ecology.msu.montana.edu>> wrote:


    Henrik,

       The clustering algorithms you refer to (and almost all others)
    expect the matrix to be symmetric.  They do not seek a
    graph-theoretic solution, but rather proximity in geometric or
    topological space.

       How did you convert y9oru matrix to a dissimilarity?

    Dave Roberts

    Henrik Aldberg wrote:

        I have a directed graph which is represented as a matrix on the form


        0 4 0 1

        6 0 0 0

        0 1 0 5

        0 0 4 0


        Each row correspond to an author (A, B, C, D) and the values
        says how many
        times this author have cited the other authors. Hence the first
        row says
        that author A have cited author B four times and author D one
        time. Thus the
        matrix represents two groups of authors: (A,B) and (C,D) who
        cites each
        other. But there is also a weak link between the groups. In
        reality this
        matrix is much bigger and very sparce but it still consists of
        distinct
        groups of authors.


        My problem is that when I cluster the matrix using pam, clara or
        agnes the
        algorithms does not find the obvious clusters. I have tried to
        turn it into
        a dissimilarity matrix before clustering but that did not help
        either.


        The layout of the clustering is not that important to me, my primary
        interest is the to get the right nodes into the right clusters.



        Sincerely


        Henrik

               [[alternative HTML version deleted]]

        ______________________________________________
        R-help@r-project.org <mailto:R-help@r-project.org> mailing list
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.


    -


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clustering algorithms don't find obvious clusters

Reply via email to