I think the benchmark output includes runtime duration. 

Sent from my iPhone

> On 2015/07/06, at 11:07, Ralph Goers <[email protected]> wrote:
> 
> Is there a way to know how long each benchmark ran? I want to make sure that 
> the benchmark is fairly representing non-cached results.
> 
> Ralph
> 
>> On Jul 5, 2015, at 5:57 PM, Ralph Goers <[email protected]> wrote:
>> 
>> I created an implementation that uses FastDateFormat and an AtomicReference. 
>>  I modified the benchmarks and the results are below.  atomicFormat is the 
>> new version. cachedFormat uses logic similar to what Remko proposed. 
>> fastFormat simply uses the FastDateFormat on every timestamp. 
>> synchronizedFormat uses logic similar to what is in the DatePatternConverter 
>> now. threadLocalFormat retrieves a threadLocal SimpleDateFormat and formats 
>> every timestamp with it.
>> 
>> 1 Thread
>> 
>> Benchmark                                                     Mode  Samples  
>>   Score   Error  Units
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample    67961  
>>  93.774 ± 3.928  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample    89151  
>>  39.136 ± 2.352  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample    66065  
>>  74.033 ± 3.613  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample    79434  
>> 316.701 ± 6.443  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample    81409  
>> 304.055 ± 5.835  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample    84137  
>> 298.179 ± 6.005  ns/op
>> 
>> 4 Threads
>> 
>> Benchmark                                                     Mode  Samples  
>>    Score     Error  Units
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   270722  
>>   82.465 ±   1.767  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   290686  
>>   46.817 ±   1.348  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   229500  
>>   86.370 ±   2.102  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   228713  
>>  435.373 ±  20.030  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   293448  
>> 2113.380 ± 138.919  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   283040  
>>  507.937 ±  15.903  ns/op
>> 
>> 8 Threads
>> 
>> Benchmark                                                     Mode  Samples  
>>    Score     Error  Units
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   389028  
>>  120.208 ±  39.351  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   523979  
>>   62.972 ±   2.952  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   606880  
>>  125.257 ±   7.320  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   470751  
>>  768.610 ±  24.760  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   614805  
>> 4598.192 ± 107.378  ns/op
>> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   434334  
>>  884.489 ± 130.365  ns/op
>> 
>> I have attached a new patch to the Jira issue.
>> 
>> Ralph
>> 
>> 
>> 
>> 
>> 
>>> On Jul 5, 2015, at 1:08 AM, Remko Popma <[email protected]> wrote:
>>> 
>>> Benchmark done, and ThreadLocal is a clear winner over synchronized block.
>>> 
>>> I've started implementing a ThreadLocal registry but I stopped when I 
>>> realized this won't solve the memory leak problem: ThreadLocal.remove() 
>>> only removes the value for the current thread. Values created for other 
>>> threads in the thread pool will not be removed...
>>> 
>>> Instead, I propose we create a SoftThreadLocal class which wraps the value 
>>> (the SimpleDateFormat in this case) in a SoftReference before storing it in 
>>> the ThreadLocal. This allows the JVM to reclaim the memory when necessary, 
>>> which should solve the memory leak. It also means we no longer need to 
>>> explicitly clear() things when the LoggerContext is stopped.
>>> 
>>> This is also a reusable solution because we can use this class everywhere 
>>> we use ThreadLocal to cache objects for performance.
>>> 
>>> Thoughts?
>>> 
>>>> On Sun, Jul 5, 2015 at 3:33 PM, Ralph Goers <[email protected]> 
>>>> wrote:
>>>> That’s right, you were going to do some performance measurements. Please 
>>>> do and let us know what works best.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Jul 4, 2015, at 10:30 PM, Remko Popma <[email protected]> wrote:
>>>>> 
>>>>> Looks like we already have an outstanding Jira for this: LOG4J2-812.
>>>>> If there are no objections I can take care of this one.
>>>>> 
>>>>>> On Sun, Jul 5, 2015 at 9:55 AM, Remko Popma <[email protected]> 
>>>>>> wrote:
>>>>>> One per logger context would make it easier to clear all ThreadContexts 
>>>>>> when a particular logger context is stopped. 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 2015/07/05, at 0:51, Gary Gregory <[email protected]> wrote:
>>>>>>> 
>>>>>>> Would there be one registry or one per logger context?
>>>>>>> 
>>>>>>> Gary 
>>>>>>> 
>>>>>>> 
>>>>>>> -------- Original message --------
>>>>>>> From: Remko Popma <[email protected]> 
>>>>>>> Date: 07/04/2015 05:41 (GMT-08:00) 
>>>>>>> To: Log4J Developers List <[email protected]> 
>>>>>>> Subject: Re: Log4j2 RollingFileAppender deadlock issue 
>>>>>>> 
>>>>>>> ThreadLocal is implemented as an internal Map in each Thread instance, 
>>>>>>> so there is constant lookup time regardless of the number of Threads 
>>>>>>> and the number of lookups. Contrast this with a lock, where performance 
>>>>>>> will decrease exponentially with more concurrent threads. 
>>>>>>> 
>>>>>>> (See also https://plumbr.eu/blog/java/how-is-threadlocal-implemented )
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>>> On 2015/07/04, at 20:40, Jess Holle <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> On 7/4/2015 2:51 AM, Gary Gregory wrote:
>>>>>>>>>> On Thu, Jul 2, 2015 at 6:18 AM, Remko Popma <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> Yes, that could still work: We could use a ThreadLocal containing a 
>>>>>>>>>> custom class which holds the                   lastTimestamp, 
>>>>>>>>>> cachedDateString as well as a SimpleDateFormat instance. 
>>>>>>>>>> 
>>>>>>>>>> As Jess pointed out, we would also need a way to clear the 
>>>>>>>>>> ThreadLocal when the LoggerContext is stopped (to prevent memory 
>>>>>>>>>> leaks in web apps). This may be the third usage of ThreadLocals in 
>>>>>>>>>> log4j2 now, so it may be worth creating a reusable mechanism for 
>>>>>>>>>> this. 
>>>>>>>>>> One idea would be to have a ThreadLocal registry in the 
>>>>>>>>>> LoggerContext, where the LoggerContext is responsible for cleaning 
>>>>>>>>>> up all registered ThreadLocals in its stop() method. 
>>>>>>>>>> 
>>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> I'm wondering what the performance cost are of doing a 
>>>>>>>>> ThreadLocal.get() vs. synchronized(this) on each call to format().
>>>>>>>> Personally I'd be less concerned with optimizing maximum logger 
>>>>>>>> throughput on any given thread than:
>>>>>>>> Ensuring that not logging takes minimal time
>>>>>>>> Minimizing potential thread contention
>>>>>>>> Logging at maximum efficiency is a priority, but comes after these 
>>>>>>>> others.
>>>>>>>> --
>>>>>>>> Jess Holle
> 

Reply via email to