Also, it is likely that the combiner has little effect.  This means that you 
are essentially using a vector to serialized single elements.  

Sent from my iPhone

On Jul 8, 2013, at 23:13, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> yes that's my working hypothesis. Serializing and combining
> RandomAccessSparseVectors is slower than elementwise messages.
> 
> 
> On Mon, Jul 8, 2013 at 11:00 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
>> It is common for double serialization to creep into the systems as well.
>> My guess however is that the primitive serialization is just much faster
>> than the vector serialization.
>> 
>> Sent from my iPhone
>> 
>> On Jul 8, 2013, at 22:55, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>> 
>>> yes, but it is just a test and I am trying to interpolate results that i
>>> see to bigger volume. sort of. To get some taste of the programming model
>>> performance.
>>> 
>>> I do get cpu-bound behavior and i hit spark cache 100% of the time. so i
>>> theory, since i am not having spills and i am not doing sorts, it should
>> be
>>> fairly fast.
>>> 
>>> I have two algorithms. One just sends elementwise messages to the vertex
>>> representing a row it should be in. Another one is using the same set of
>>> initial messages but also uses Bagel combiners which, the way i
>> understand
>>> it, apply combining of elements to form partial vectors before shipping
>> it
>>> off to remote vertex paritition. Reasoning here apparently since elements
>>> are combined, there's fewer io. Well, perhaps not in this case so much,
>>> since we are not really doing any sort of information aggregation. On
>>> single spark node setup i of course don't have actual io, so it should
>>> approach speed of in-core copy-by-serialization.
>>> 
>>> What i am seeing is that elementwise messages work almost two times
>> faster
>>> in cpu bound behavior than the version with combiners. it would seem the
>>> culprit is that VectorWritable serialization and then deserialization of
>>> vectorized fragments is considerably slower than serialization of
>>> elementwise messages containing only primitive types there (target row,
>>> index, value), even that the latter is significantly larger amount of
>>> objects as well as data.
>>> 
>>> Still though, i am trying to convince myself that even using combiners
>>> should be ok compared to shuffle and sort overhead. But i think in
>> reality
>>> it still looks a bit slower than i expected. well i guess i should not be
>>> lazy and benchmark it against Mahout MR-based transpose as well as
>> spark's
>>> version of RDD shuffle-and-sort.
>>> 
>>> anyway, map-only tasks on spark distributed matrices are lightning fast
>> but
>>> Bagel serialze/deserialize scatter/gather seems to be much slower than
>> just
>>> map-only processing. Perhaps I am doing it wrong somehow.
>>> 
>>> 
>>> On Mon, Jul 8, 2013 at 10:22 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>> 
>>>> Transpose of that small a matrix should happen in memory.
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Jul 8, 2013, at 17:26, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>>> 
>>>>> Anybody knows how good (or bad) our performance on matrix transpose?
>> how
>>>>> long will it take to transpose a 10M non-zeros with Mahout (if i wanted
>>>> to
>>>>> setup fully distributed but single node MR cluster?)
>>>>> 
>>>>> Trying to figure if the numbers i see with Bagel-based Mahout matrix
>>>>> transposition are any good.
>> 

Reply via email to