Also, it is likely that the combiner has little effect. This means that you are essentially using a vector to serialized single elements.
Sent from my iPhone On Jul 8, 2013, at 23:13, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > yes that's my working hypothesis. Serializing and combining > RandomAccessSparseVectors is slower than elementwise messages. > > > On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> It is common for double serialization to creep into the systems as well. >> My guess however is that the primitive serialization is just much faster >> than the vector serialization. >> >> Sent from my iPhone >> >> On Jul 8, 2013, at 22:55, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> >>> yes, but it is just a test and I am trying to interpolate results that i >>> see to bigger volume. sort of. To get some taste of the programming model >>> performance. >>> >>> I do get cpu-bound behavior and i hit spark cache 100% of the time. so i >>> theory, since i am not having spills and i am not doing sorts, it should >> be >>> fairly fast. >>> >>> I have two algorithms. One just sends elementwise messages to the vertex >>> representing a row it should be in. Another one is using the same set of >>> initial messages but also uses Bagel combiners which, the way i >> understand >>> it, apply combining of elements to form partial vectors before shipping >> it >>> off to remote vertex paritition. Reasoning here apparently since elements >>> are combined, there's fewer io. Well, perhaps not in this case so much, >>> since we are not really doing any sort of information aggregation. On >>> single spark node setup i of course don't have actual io, so it should >>> approach speed of in-core copy-by-serialization. >>> >>> What i am seeing is that elementwise messages work almost two times >> faster >>> in cpu bound behavior than the version with combiners. it would seem the >>> culprit is that VectorWritable serialization and then deserialization of >>> vectorized fragments is considerably slower than serialization of >>> elementwise messages containing only primitive types there (target row, >>> index, value), even that the latter is significantly larger amount of >>> objects as well as data. >>> >>> Still though, i am trying to convince myself that even using combiners >>> should be ok compared to shuffle and sort overhead. But i think in >> reality >>> it still looks a bit slower than i expected. well i guess i should not be >>> lazy and benchmark it against Mahout MR-based transpose as well as >> spark's >>> version of RDD shuffle-and-sort. >>> >>> anyway, map-only tasks on spark distributed matrices are lightning fast >> but >>> Bagel serialze/deserialize scatter/gather seems to be much slower than >> just >>> map-only processing. Perhaps I am doing it wrong somehow. >>> >>> >>> On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >>> >>>> Transpose of that small a matrix should happen in memory. >>>> >>>> Sent from my iPhone >>>> >>>> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>>> >>>>> Anybody knows how good (or bad) our performance on matrix transpose? >> how >>>>> long will it take to transpose a 10M non-zeros with Mahout (if i wanted >>>> to >>>>> setup fully distributed but single node MR cluster?) >>>>> >>>>> Trying to figure if the numbers i see with Bagel-based Mahout matrix >>>>> transposition are any good. >>