Is there a way to know how long each benchmark ran? I want to make sure that 
the benchmark is fairly representing non-cached results.

Ralph

> On Jul 5, 2015, at 5:57 PM, Ralph Goers <[email protected]> wrote:
> 
> I created an implementation that uses FastDateFormat and an AtomicReference.  
> I modified the benchmarks and the results are below.  atomicFormat is the new 
> version. cachedFormat uses logic similar to what Remko proposed. fastFormat 
> simply uses the FastDateFormat on every timestamp. synchronizedFormat uses 
> logic similar to what is in the DatePatternConverter now. threadLocalFormat 
> retrieves a threadLocal SimpleDateFormat and formats every timestamp with it.
> 
> 1 Thread
> 
> Benchmark                                                     Mode  Samples   
>  Score   Error  Units
> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample    67961   
> 93.774 ± 3.928  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample    89151   
> 39.136 ± 2.352  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample    66065   
> 74.033 ± 3.613  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample    79434  
> 316.701 ± 6.443  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample    81409  
> 304.055 ± 5.835  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample    84137  
> 298.179 ± 6.005  ns/op
> 
> 4 Threads
> 
> Benchmark                                                     Mode  Samples   
>   Score     Error  Units
> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   270722   
>  82.465 ±   1.767  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   290686   
>  46.817 ±   1.348  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   229500   
>  86.370 ±   2.102  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   228713   
> 435.373 ±  20.030  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   293448  
> 2113.380 ± 138.919  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   283040   
> 507.937 ±  15.903  ns/op
> 
> 8 Threads
> 
> Benchmark                                                     Mode  Samples   
>   Score     Error  Units
> o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat          sample   389028   
> 120.208 ±  39.351  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline              sample   523979   
>  62.972 ±   2.952  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat          sample   606880   
> 125.257 ±   7.320  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat            sample   470751   
> 768.610 ±  24.760  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat    sample   614805  
> 4598.192 ± 107.378  ns/op
> o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat     sample   434334   
> 884.489 ± 130.365  ns/op
> 
> I have attached a new patch to the Jira issue.
> 
> Ralph
> 
> 
> 
> 
> 
>> On Jul 5, 2015, at 1:08 AM, Remko Popma <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Benchmark done 
>> <https://issues.apache.org/jira/browse/LOG4J2-812?focusedCommentId=14614174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14614174>,
>>  and ThreadLocal is a clear winner over synchronized block.
>> 
>> I've started implementing a ThreadLocal registry but I stopped when I 
>> realized this won't solve the memory leak problem: ThreadLocal.remove() only 
>> removes the value for the current thread. Values created for other threads 
>> in the thread pool will not be removed...
>> 
>> Instead, I propose we create a SoftThreadLocal class which wraps the value 
>> (the SimpleDateFormat in this case) in a SoftReference before storing it in 
>> the ThreadLocal. This allows the JVM to reclaim the memory when necessary, 
>> which should solve the memory leak. It also means we no longer need to 
>> explicitly clear() things when the LoggerContext is stopped.
>> 
>> This is also a reusable solution because we can use this class everywhere we 
>> use ThreadLocal to cache objects for performance.
>> 
>> Thoughts?
>> 
>> On Sun, Jul 5, 2015 at 3:33 PM, Ralph Goers <[email protected] 
>> <mailto:[email protected]>> wrote:
>> That’s right, you were going to do some performance measurements. Please do 
>> and let us know what works best.
>> 
>> Ralph
>> 
>>> On Jul 4, 2015, at 10:30 PM, Remko Popma <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Looks like we already have an outstanding Jira for this: LOG4J2-812 
>>> <https://issues.apache.org/jira/browse/LOG4J2-812>.
>>> If there are no objections I can take care of this one.
>>> 
>>> On Sun, Jul 5, 2015 at 9:55 AM, Remko Popma <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> One per logger context would make it easier to clear all ThreadContexts 
>>> when a particular logger context is stopped. 
>>> 
>>> Sent from my iPhone
>>> 
>>> On 2015/07/05, at 0:51, Gary Gregory <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Would there be one registry or one per logger context?
>>>> 
>>>> Gary 
>>>> 
>>>> 
>>>> -------- Original message --------
>>>> From: Remko Popma <[email protected] <mailto:[email protected]>> 
>>>> Date: 07/04/2015 05:41 (GMT-08:00) 
>>>> To: Log4J Developers List <[email protected] 
>>>> <mailto:[email protected]>> 
>>>> Subject: Re: Log4j2 RollingFileAppender deadlock issue 
>>>> 
>>>> ThreadLocal is implemented as an internal Map in each Thread instance, so 
>>>> there is constant lookup time regardless of the number of Threads and the 
>>>> number of lookups. Contrast this with a lock, where performance will 
>>>> decrease exponentially with more concurrent threads. 
>>>> 
>>>> (See also https://plumbr.eu/blog/java/how-is-threadlocal-implemented 
>>>> <https://plumbr.eu/blog/java/how-is-threadlocal-implemented> )
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On 2015/07/04, at 20:40, Jess Holle <[email protected] <mailto:[email protected]>> 
>>>> wrote:
>>>> 
>>>>> On 7/4/2015 2:51 AM, Gary Gregory wrote:
>>>>>> On Thu, Jul 2, 2015 at 6:18 AM, Remko Popma <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> Yes, that could still work: We could use a ThreadLocal containing a 
>>>>>> custom class which holds the lastTimestamp, cachedDateString as well as 
>>>>>> a SimpleDateFormat instance. 
>>>>>> 
>>>>>> As Jess pointed out, we would also need a way to clear the ThreadLocal 
>>>>>> when the LoggerContext is stopped (to prevent memory leaks in web apps). 
>>>>>> This may be the third usage of ThreadLocals in log4j2 now, so it may be 
>>>>>> worth creating a reusable mechanism for this. 
>>>>>> One idea would be to have a ThreadLocal registry in the LoggerContext, 
>>>>>> where the LoggerContext is responsible for cleaning up all registered 
>>>>>> ThreadLocals in its stop() method. 
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> I'm wondering what the performance cost are of doing a ThreadLocal.get() 
>>>>>> vs. synchronized(this) on each call to format().
>>>>>> 
>>>>> Personally I'd be less concerned with optimizing maximum logger 
>>>>> throughput on any given thread than:
>>>>> Ensuring that not logging takes minimal time
>>>>> Minimizing potential thread contention
>>>>> Logging at maximum efficiency is a priority, but comes after these others.
>>>>> --
>>>>> Jess Holle
>>>>> 
>>> 
>> 
>> 
> 

Reply via email to