OK. If I am reading this correctly, then each iteration is 1 second. On the
iteration shown below 89383 calls were made during that 1 second. I should
think that would do fine to account for the “normal” case.
Iteration 1: n = 89383, mean = 97 ns/op, p{0.00, 0.50, 0.90, 0.95, 0.99,
0.999, 0.9999, 1.00} = 0, 0, 0, 1000, 1000, 1000, 1000, 20000 ns/op
Ralph
> On Jul 5, 2015, at 7:56 PM, Remko Popma <[email protected]> wrote:
>
> I think the benchmark output includes runtime duration.
>
> Sent from my iPhone
>
> On 2015/07/06, at 11:07, Ralph Goers <[email protected]
> <mailto:[email protected]>> wrote:
>
>> Is there a way to know how long each benchmark ran? I want to make sure that
>> the benchmark is fairly representing non-cached results.
>>
>> Ralph
>>
>>> On Jul 5, 2015, at 5:57 PM, Ralph Goers <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> I created an implementation that uses FastDateFormat and an
>>> AtomicReference. I modified the benchmarks and the results are below.
>>> atomicFormat is the new version. cachedFormat uses logic similar to what
>>> Remko proposed. fastFormat simply uses the FastDateFormat on every
>>> timestamp. synchronizedFormat uses logic similar to what is in the
>>> DatePatternConverter now. threadLocalFormat retrieves a threadLocal
>>> SimpleDateFormat and formats every timestamp with it.
>>>
>>> 1 Thread
>>>
>>> Benchmark Mode Samples
>>> Score Error Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 67961
>>> 93.774 ± 3.928 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 89151
>>> 39.136 ± 2.352 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 66065
>>> 74.033 ± 3.613 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 79434
>>> 316.701 ± 6.443 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 81409
>>> 304.055 ± 5.835 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 84137
>>> 298.179 ± 6.005 ns/op
>>>
>>> 4 Threads
>>>
>>> Benchmark Mode Samples
>>> Score Error Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 270722
>>> 82.465 ± 1.767 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 290686
>>> 46.817 ± 1.348 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 229500
>>> 86.370 ± 2.102 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 228713
>>> 435.373 ± 20.030 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 293448
>>> 2113.380 ± 138.919 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 283040
>>> 507.937 ± 15.903 ns/op
>>>
>>> 8 Threads
>>>
>>> Benchmark Mode Samples
>>> Score Error Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 389028
>>> 120.208 ± 39.351 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 523979
>>> 62.972 ± 2.952 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 606880
>>> 125.257 ± 7.320 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 470751
>>> 768.610 ± 24.760 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 614805
>>> 4598.192 ± 107.378 ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 434334
>>> 884.489 ± 130.365 ns/op
>>>
>>> I have attached a new patch to the Jira issue.
>>>
>>> Ralph
>>>
>>>
>>>
>>>
>>>
>>>> On Jul 5, 2015, at 1:08 AM, Remko Popma <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Benchmark done
>>>> <https://issues.apache.org/jira/browse/LOG4J2-812?focusedCommentId=14614174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14614174>,
>>>> and ThreadLocal is a clear winner over synchronized block.
>>>>
>>>> I've started implementing a ThreadLocal registry but I stopped when I
>>>> realized this won't solve the memory leak problem: ThreadLocal.remove()
>>>> only removes the value for the current thread. Values created for other
>>>> threads in the thread pool will not be removed...
>>>>
>>>> Instead, I propose we create a SoftThreadLocal class which wraps the value
>>>> (the SimpleDateFormat in this case) in a SoftReference before storing it
>>>> in the ThreadLocal. This allows the JVM to reclaim the memory when
>>>> necessary, which should solve the memory leak. It also means we no longer
>>>> need to explicitly clear() things when the LoggerContext is stopped.
>>>>
>>>> This is also a reusable solution because we can use this class everywhere
>>>> we use ThreadLocal to cache objects for performance.
>>>>
>>>> Thoughts?
>>>>
>>>> On Sun, Jul 5, 2015 at 3:33 PM, Ralph Goers <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>> That’s right, you were going to do some performance measurements. Please
>>>> do and let us know what works best.
>>>>
>>>> Ralph
>>>>
>>>>> On Jul 4, 2015, at 10:30 PM, Remko Popma <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> Looks like we already have an outstanding Jira for this: LOG4J2-812
>>>>> <https://issues.apache.org/jira/browse/LOG4J2-812>.
>>>>> If there are no objections I can take care of this one.
>>>>>
>>>>> On Sun, Jul 5, 2015 at 9:55 AM, Remko Popma <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>> One per logger context would make it easier to clear all ThreadContexts
>>>>> when a particular logger context is stopped.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 2015/07/05, at 0:51, Gary Gregory <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>> Would there be one registry or one per logger context?
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>>
>>>>>> -------- Original message --------
>>>>>> From: Remko Popma <[email protected] <mailto:[email protected]>>
>>>>>> Date: 07/04/2015 05:41 (GMT-08:00)
>>>>>> To: Log4J Developers List <[email protected]
>>>>>> <mailto:[email protected]>>
>>>>>> Subject: Re: Log4j2 RollingFileAppender deadlock issue
>>>>>>
>>>>>> ThreadLocal is implemented as an internal Map in each Thread instance,
>>>>>> so there is constant lookup time regardless of the number of Threads and
>>>>>> the number of lookups. Contrast this with a lock, where performance will
>>>>>> decrease exponentially with more concurrent threads.
>>>>>>
>>>>>> (See also https://plumbr.eu/blog/java/how-is-threadlocal-implemented
>>>>>> <https://plumbr.eu/blog/java/how-is-threadlocal-implemented> )
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On 2015/07/04, at 20:40, Jess Holle <[email protected]
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>> On 7/4/2015 2:51 AM, Gary Gregory wrote:
>>>>>>>> On Thu, Jul 2, 2015 at 6:18 AM, Remko Popma <[email protected]
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Yes, that could still work: We could use a ThreadLocal containing a
>>>>>>>> custom class which holds the lastTimestamp, cachedDateString as well
>>>>>>>> as a SimpleDateFormat instance.
>>>>>>>>
>>>>>>>> As Jess pointed out, we would also need a way to clear the ThreadLocal
>>>>>>>> when the LoggerContext is stopped (to prevent memory leaks in web
>>>>>>>> apps). This may be the third usage of ThreadLocals in log4j2 now, so
>>>>>>>> it may be worth creating a reusable mechanism for this.
>>>>>>>> One idea would be to have a ThreadLocal registry in the LoggerContext,
>>>>>>>> where the LoggerContext is responsible for cleaning up all registered
>>>>>>>> ThreadLocals in its stop() method.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> I'm wondering what the performance cost are of doing a
>>>>>>>> ThreadLocal.get() vs. synchronized(this) on each call to format().
>>>>>>>>
>>>>>>> Personally I'd be less concerned with optimizing maximum logger
>>>>>>> throughput on any given thread than:
>>>>>>> Ensuring that not logging takes minimal time
>>>>>>> Minimizing potential thread contention
>>>>>>> Logging at maximum efficiency is a priority, but comes after these
>>>>>>> others.
>>>>>>> --
>>>>>>> Jess Holle
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>