Hi Gilles,

thanks for your response. I've run a small JMH benchmark of both old
(direct summation) and new (default 16KB tabulation) approaches, and here
are the results:

Benchmark                                           Mode  Samples
Score      Error   Units
o.a.c.m.s.FactorialLogBench.newFactorialLog100     thrpt       20
75066.697 ± 7122.980  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog1000    thrpt       20
76905.243 ± 5965.782  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog10K     thrpt       20  5770.486
± 285.069  ops/ms
o.a.c.m.s.FactorialLogBench.newFactorialLog100K    thrpt       20  6156.102
± 278.842  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog100     thrpt       20
585.576 ±   26.802  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog1000    thrpt       20
29.883 ±    1.663  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog10K     thrpt       20     2.744
±   0.095  ops/ms
o.a.c.m.s.FactorialLogBench.oldFactorialLog100K    thrpt       20     0.264
±   0.007  ops/ms

Here *100 calculates the factorial of a random number in range [0..99],
*1000 -- of a number in range [900..999], *10K -- of a number in range
[9900..9999], and *100K -- of a number in range [99900..99999]. The default
tabulated version covers the first two cases with a direct lookup, the
third one with a use-the-closest-stored-value approach (about two logs, two
multiplications and two additions on average), and the fourth one delegates
to logGamma.
As you can see, the tabulated (new) approach is about 130x faster on small
numbers and the difference grows drastically from there on, up to about
23000x on fairly big numbers.

Another interesting result is that logGamma doesn't seem to be slower than
the assisted lookup, which possibly indicates that we can drop the assisted
lookup altogether, reducing the memory footprint.

Sincerely,
Aleksei Dievskii.

On Thu, Nov 19, 2015 at 11:18 AM, Gilles <gil...@harfang.homelinux.org>
wrote:

> Hello.
>
> On Thu, 19 Nov 2015 10:34:46 +0100, Alexey Dievsky wrote:
>
>> Hi,
>>
>> I recently submitted an issue to JIRA (
>> https://issues.apache.org/jira/browse/MATH-1293) and received an advice
>> to
>> discuss it here.
>>
>> In short, the current implementation of CombinatoricsUtils#factorialLog
>> employs direct summation and thus has linear complexity (O(n) additions
>> and
>> O(n) logs). I suggest replacing it with a tabulated lookup. This solution
>> will need to allocate some additional memory, but the speed-up for larger
>> n
>> would be tremendous. For the values that are too large to be looked up,
>> equivalent Gamma#logGamma is still much faster than the existing
>> algorithm.
>>
>> I attached a proposed patch to the issue. However, there are a lot of
>> details that need to be worked out, first and foremost whether such
>> optimization is at all worth it.
>>
>
> According to your statement above (significantly improved performance),
> there
> is actually no doubt that the feature should be offered to CM users.[1][2]
> The main question is whether a single speed vs memory (vs possibly
> increased
> initialization time) trade-off is always acceptable.
>
> Hence at first sight, I'd prefer to provide a factory to create an
> object (that will compute the factorial) with a user-defined trade-off.
>
> Regards,
> Gilles
>
> [1] It would also be nice if you could provide a benchmark.
> [2] A real-life use-case would also be nice to enhance the CM userguide.
>
> Thus I ask for your opinion. Please let me
>> know what think.
>>
>> Sincerely,
>> Aleksei Dievskii.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to