Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

Dan Filimon Mon, 22 Apr 2013 10:31:27 -0700

And... they failed. I broke some DistributedRowMatrix tests.


On Mon, Apr 22, 2013 at 8:14 PM, Dan Filimon <[email protected]>wrote:

> The tests are still running. I want to make sure they all pass before
> another round of benchmarks. :)
>
>
> On Mon, Apr 22, 2013 at 8:00 PM, Robin Anil <[email protected]> wrote:
>
>> Can you update/create a spreadsheet of where you are right now v/s trunk
>> On Apr 22, 2013 11:51 AM, "Dan Filimon" <[email protected]>
>> wrote:
>>
>>> In fact the issue I was referring to turns out to be because the very
>>> fast case was in fact wrong.
>>> When merging two sparse vectors I wasn't updating the number of mappings
>>> in the result.
>>>
>>> Performance is now better for the more "tuned" vectors.
>>> I have noticed some random regressions with dense vectors ... this is
>>> pretty odd. :/
>>>
>>> Anyway, can you give me some insight into:
>>> - what exactly the numbers in the spreadsheet mean?
>>> - what is the "Cluster" score for some benchmarks? There don't seem to
>>> be explicit calls to any cluster vectors.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Mon, Apr 22, 2013 at 5:24 PM, Robin Anil <[email protected]>wrote:
>>>
>>>> Yes every time you replace primitive call you are at the mercy of jit
>>>> to inline the method. Choose primitive wherever possible to reduce
>>>> variability
>>>>  On Apr 22, 2013 7:15 AM, "Dan Filimon" <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks!
>>>>>
>>>>> So, I'm running more benchmark and it's a mixed bag. There are
>>>>> regressions and gains, but what surprises me the most is that after
>>>>> replacing every "primitive" call with calls to assign/aggregate, the
>>>>> clustering behaves much worse.
>>>>>
>>>>> As in, dozens (literally) of times worse. I'm surprised it's so bad,
>>>>> yet doesn't show in the benchmarks.
>>>>> Any ideas why this might be, or what I should look into?
>>>>>
>>>>>
>>>>> On Sat, Apr 20, 2013 at 9:14 PM, Robin Anil <[email protected]>wrote:
>>>>>
>>>>>>
>>>>>> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=2
>>>>>>
>>>>>> Here you go. There are some regressions and some improvements. One of
>>>>>> the major reasons I think is replacing inline math with foo.apply(). JVM
>>>>>> might not have optimized it yet. You might be better off but just adding 
>>>>>> an
>>>>>> AggregateBenchmark and working on it for your functions before replacing
>>>>>> entire AbstractVector methods.
>>>>>>
>>>>>
>>>>>
>>>
>

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

Reply via email to