Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

Robin Anil Mon, 22 Apr 2013 11:02:58 -0700

PS, you can modify a constant in the VectorBenchmarks.java to increase time
per benchmark (currently set to 500 milliseconds). To increase the
confidence in the benchmark sample. But for now its fine.


------
Robin Anil



On Mon, Apr 22, 2013 at 12:30 PM, Dan Filimon
<[email protected]>wrote:

> And... they failed. I broke some DistributedRowMatrix tests.
>
>
> On Mon, Apr 22, 2013 at 8:14 PM, Dan Filimon <[email protected]
> >wrote:
>
> > The tests are still running. I want to make sure they all pass before
> > another round of benchmarks. :)
> >
> >
> > On Mon, Apr 22, 2013 at 8:00 PM, Robin Anil <[email protected]>
> wrote:
> >
> >> Can you update/create a spreadsheet of where you are right now v/s trunk
> >> On Apr 22, 2013 11:51 AM, "Dan Filimon" <[email protected]>
> >> wrote:
> >>
> >>> In fact the issue I was referring to turns out to be because the very
> >>> fast case was in fact wrong.
> >>> When merging two sparse vectors I wasn't updating the number of
> mappings
> >>> in the result.
> >>>
> >>> Performance is now better for the more "tuned" vectors.
> >>> I have noticed some random regressions with dense vectors ... this is
> >>> pretty odd. :/
> >>>
> >>> Anyway, can you give me some insight into:
> >>> - what exactly the numbers in the spreadsheet mean?
> >>> - what is the "Cluster" score for some benchmarks? There don't seem to
> >>> be explicit calls to any cluster vectors.
> >>>
> >>> Thanks!
> >>>
> >>>
> >>>
> >>> On Mon, Apr 22, 2013 at 5:24 PM, Robin Anil <[email protected]
> >wrote:
> >>>
> >>>> Yes every time you replace primitive call you are at the mercy of jit
> >>>> to inline the method. Choose primitive wherever possible to reduce
> >>>> variability
> >>>>  On Apr 22, 2013 7:15 AM, "Dan Filimon" <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Thanks!
> >>>>>
> >>>>> So, I'm running more benchmark and it's a mixed bag. There are
> >>>>> regressions and gains, but what surprises me the most is that after
> >>>>> replacing every "primitive" call with calls to assign/aggregate, the
> >>>>> clustering behaves much worse.
> >>>>>
> >>>>> As in, dozens (literally) of times worse. I'm surprised it's so bad,
> >>>>> yet doesn't show in the benchmarks.
> >>>>> Any ideas why this might be, or what I should look into?
> >>>>>
> >>>>>
> >>>>> On Sat, Apr 20, 2013 at 9:14 PM, Robin Anil <[email protected]
> >wrote:
> >>>>>
> >>>>>>
> >>>>>>
> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=2
> >>>>>>
> >>>>>> Here you go. There are some regressions and some improvements. One
> of
> >>>>>> the major reasons I think is replacing inline math with
> foo.apply(). JVM
> >>>>>> might not have optimized it yet. You might be better off but just
> adding an
> >>>>>> AggregateBenchmark and working on it for your functions before
> replacing
> >>>>>> entire AbstractVector methods.
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >
>

Re: Review Request: MAHOUT-1192 [2]: Speed up Vector Operations

Reply via email to