OK. If I am reading this correctly, then each iteration is 1 second. On the 
iteration shown below 89383 calls were made during that 1 second.  I should 
think that would do fine to account for the “normal” case.

Iteration   1: n = 89383, mean = 97 ns/op, p{0.00, 0.50, 0.90, 0.95, 0.99, 
0.999, 0.9999, 1.00} = 0, 0, 0, 1000, 1000, 1000, 1000, 20000 ns/op

Ralph



> On Jul 5, 2015, at 7:56 PM, Remko Popma <[email protected]> wrote:
> 
> I think the benchmark output includes runtime duration. 
> 
> Sent from my iPhone
> 
> On 2015/07/06, at 11:07, Ralph Goers <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Is there a way to know how long each benchmark ran? I want to make sure that 
>> the benchmark is fairly representing non-cached results.
>> 
>> Ralph
>> 
>>> On Jul 5, 2015, at 5:57 PM, Ralph Goers <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> I created an implementation that uses FastDateFormat and an 
>>> AtomicReference.  I modified the benchmarks and the results are below.  
>>> atomicFormat is the new version. cachedFormat uses logic similar to what 
>>> Remko proposed. fastFormat simply uses the FastDateFormat on every 
>>> timestamp. synchronizedFormat uses logic similar to what is in the 
>>> DatePatternConverter now. threadLocalFormat retrieves a threadLocal 
>>> SimpleDateFormat and formats every timestamp with it.
>>> 
>>> 1 Thread
>>> 
>>> Benchmark                                                     Mode  Samples 
>>>    Score   Error  Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample    67961 
>>>   93.774 ± 3.928  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample    89151 
>>>   39.136 ± 2.352  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample    66065 
>>>   74.033 ± 3.613  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample    79434 
>>>  316.701 ± 6.443  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample    81409 
>>>  304.055 ± 5.835  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample    84137 
>>>  298.179 ± 6.005  ns/op
>>> 
>>> 4 Threads
>>> 
>>> Benchmark                                                     Mode  Samples 
>>>     Score     Error  Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   270722 
>>>    82.465 ±   1.767  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   290686 
>>>    46.817 ±   1.348  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   229500 
>>>    86.370 ±   2.102  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   228713 
>>>   435.373 ±  20.030  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   293448 
>>>  2113.380 ± 138.919  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   283040 
>>>   507.937 ±  15.903  ns/op
>>> 
>>> 8 Threads
>>> 
>>> Benchmark                                                     Mode  Samples 
>>>     Score     Error  Units
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   389028 
>>>   120.208 ±  39.351  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   523979 
>>>    62.972 ±   2.952  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   606880 
>>>   125.257 ±   7.320  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   470751 
>>>   768.610 ±  24.760  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   614805 
>>>  4598.192 ± 107.378  ns/op
>>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   434334 
>>>   884.489 ± 130.365  ns/op
>>> 
>>> I have attached a new patch to the Jira issue.
>>> 
>>> Ralph
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Jul 5, 2015, at 1:08 AM, Remko Popma <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Benchmark done 
>>>> <https://issues.apache.org/jira/browse/LOG4J2-812?focusedCommentId=14614174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14614174>,
>>>>  and ThreadLocal is a clear winner over synchronized block.
>>>> 
>>>> I've started implementing a ThreadLocal registry but I stopped when I 
>>>> realized this won't solve the memory leak problem: ThreadLocal.remove() 
>>>> only removes the value for the current thread. Values created for other 
>>>> threads in the thread pool will not be removed...
>>>> 
>>>> Instead, I propose we create a SoftThreadLocal class which wraps the value 
>>>> (the SimpleDateFormat in this case) in a SoftReference before storing it 
>>>> in the ThreadLocal. This allows the JVM to reclaim the memory when 
>>>> necessary, which should solve the memory leak. It also means we no longer 
>>>> need to explicitly clear() things when the LoggerContext is stopped.
>>>> 
>>>> This is also a reusable solution because we can use this class everywhere 
>>>> we use ThreadLocal to cache objects for performance.
>>>> 
>>>> Thoughts?
>>>> 
>>>> On Sun, Jul 5, 2015 at 3:33 PM, Ralph Goers <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> That’s right, you were going to do some performance measurements. Please 
>>>> do and let us know what works best.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Jul 4, 2015, at 10:30 PM, Remko Popma <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Looks like we already have an outstanding Jira for this: LOG4J2-812 
>>>>> <https://issues.apache.org/jira/browse/LOG4J2-812>.
>>>>> If there are no objections I can take care of this one.
>>>>> 
>>>>> On Sun, Jul 5, 2015 at 9:55 AM, Remko Popma <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> One per logger context would make it easier to clear all ThreadContexts 
>>>>> when a particular logger context is stopped. 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On 2015/07/05, at 0:51, Gary Gregory <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>>> Would there be one registry or one per logger context?
>>>>>> 
>>>>>> Gary 
>>>>>> 
>>>>>> 
>>>>>> -------- Original message --------
>>>>>> From: Remko Popma <[email protected] <mailto:[email protected]>> 
>>>>>> Date: 07/04/2015 05:41 (GMT-08:00) 
>>>>>> To: Log4J Developers List <[email protected] 
>>>>>> <mailto:[email protected]>> 
>>>>>> Subject: Re: Log4j2 RollingFileAppender deadlock issue 
>>>>>> 
>>>>>> ThreadLocal is implemented as an internal Map in each Thread instance, 
>>>>>> so there is constant lookup time regardless of the number of Threads and 
>>>>>> the number of lookups. Contrast this with a lock, where performance will 
>>>>>> decrease exponentially with more concurrent threads. 
>>>>>> 
>>>>>> (See also https://plumbr.eu/blog/java/how-is-threadlocal-implemented 
>>>>>> <https://plumbr.eu/blog/java/how-is-threadlocal-implemented> )
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On 2015/07/04, at 20:40, Jess Holle <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>>> On 7/4/2015 2:51 AM, Gary Gregory wrote:
>>>>>>>> On Thu, Jul 2, 2015 at 6:18 AM, Remko Popma <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Yes, that could still work: We could use a ThreadLocal containing a 
>>>>>>>> custom class which holds the lastTimestamp, cachedDateString as well 
>>>>>>>> as a SimpleDateFormat instance. 
>>>>>>>> 
>>>>>>>> As Jess pointed out, we would also need a way to clear the ThreadLocal 
>>>>>>>> when the LoggerContext is stopped (to prevent memory leaks in web 
>>>>>>>> apps). This may be the third usage of ThreadLocals in log4j2 now, so 
>>>>>>>> it may be worth creating a reusable mechanism for this. 
>>>>>>>> One idea would be to have a ThreadLocal registry in the LoggerContext, 
>>>>>>>> where the LoggerContext is responsible for cleaning up all registered 
>>>>>>>> ThreadLocals in its stop() method. 
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> I'm wondering what the performance cost are of doing a 
>>>>>>>> ThreadLocal.get() vs. synchronized(this) on each call to format().
>>>>>>>> 
>>>>>>> Personally I'd be less concerned with optimizing maximum logger 
>>>>>>> throughput on any given thread than:
>>>>>>> Ensuring that not logging takes minimal time
>>>>>>> Minimizing potential thread contention
>>>>>>> Logging at maximum efficiency is a priority, but comes after these 
>>>>>>> others.
>>>>>>> --
>>>>>>> Jess Holle
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 

Reply via email to