Shannon,

To add to Jake's voice on this,

dense + large == not scalable if only because of the n^2 entries.  Scalable
implies O(n) time and O(1) memory throughout on every machine involved in
the computation.  Even n log n doesn't work because it usually implies O(n)
memory.

For non-parallel algorithms, this implies on-line algorithms are pretty much
all that works.  For parallel programs, we usually assume something like at
most O(log n) machines in the cluster.

On Mon, Jul 19, 2010 at 11:25 PM, Jake Mannix <jake.man...@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 1:38 AM, Shannon Quinn <squ...@gatech.edu> wrote:
>
> > Hi all,
> >
> > I have a question that's a bit more rooted in theory. In my clustering
> > algorithm, I depend on the markovian properties of the graph
> representation
> > of the data, using the eigenvectors/eigenvalues to determine where the
> > cluster boundaries are. The markov transition matrix is dense, as it
> > represents the full distribution of a random walk through the entire data
> > set.
> >
>
> Hey Shannon,
>
>  I really wish I'd had the time to read more on the full theory of this
> algorithm -
> is it really the case that you are always finding eigenvectors of a fully
> *dense*
> matrix?  The DRM and Lanczos impls in Mahout may really not be terribly
> suited to this use case, if so...
>
>  Can the algorithm work if you "sparsify" your transition matrix, so that
> it
> only keeps the relatively significant entries, setting the smaller values
> to
> zero?
>
>  -jake
>

Reply via email to