I created an implementation that uses FastDateFormat and an AtomicReference. I modified the benchmarks and the results are below. atomicFormat is the new version. cachedFormat uses logic similar to what Remko proposed. fastFormat simply uses the FastDateFormat on every timestamp. synchronizedFormat uses logic similar to what is in the DatePatternConverter now. threadLocalFormat retrieves a threadLocal SimpleDateFormat and formats every timestamp with it.
1 Thread Benchmark Mode Samples Score Error Units o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 67961 93.774 ± 3.928 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 89151 39.136 ± 2.352 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 66065 74.033 ± 3.613 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 79434 316.701 ± 6.443 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 81409 304.055 ± 5.835 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 84137 298.179 ± 6.005 ns/op 4 Threads Benchmark Mode Samples Score Error Units o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 270722 82.465 ± 1.767 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 290686 46.817 ± 1.348 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 229500 86.370 ± 2.102 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 228713 435.373 ± 20.030 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 293448 2113.380 ± 138.919 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 283040 507.937 ± 15.903 ns/op 8 Threads Benchmark Mode Samples Score Error Units o.a.l.l.p.j.SimpleDateFormatBenchmark.atomicFormat sample 389028 120.208 ± 39.351 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.baseline sample 523979 62.972 ± 2.952 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.cachedFormat sample 606880 125.257 ± 7.320 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.fastFormat sample 470751 768.610 ± 24.760 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.synchronizedFormat sample 614805 4598.192 ± 107.378 ns/op o.a.l.l.p.j.SimpleDateFormatBenchmark.threadLocalFormat sample 434334 884.489 ± 130.365 ns/op I have attached a new patch to the Jira issue. Ralph > On Jul 5, 2015, at 1:08 AM, Remko Popma <[email protected]> wrote: > > Benchmark done > <https://issues.apache.org/jira/browse/LOG4J2-812?focusedCommentId=14614174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14614174>, > and ThreadLocal is a clear winner over synchronized block. > > I've started implementing a ThreadLocal registry but I stopped when I > realized this won't solve the memory leak problem: ThreadLocal.remove() only > removes the value for the current thread. Values created for other threads in > the thread pool will not be removed... > > Instead, I propose we create a SoftThreadLocal class which wraps the value > (the SimpleDateFormat in this case) in a SoftReference before storing it in > the ThreadLocal. This allows the JVM to reclaim the memory when necessary, > which should solve the memory leak. It also means we no longer need to > explicitly clear() things when the LoggerContext is stopped. > > This is also a reusable solution because we can use this class everywhere we > use ThreadLocal to cache objects for performance. > > Thoughts? > > On Sun, Jul 5, 2015 at 3:33 PM, Ralph Goers <[email protected] > <mailto:[email protected]>> wrote: > That’s right, you were going to do some performance measurements. Please do > and let us know what works best. > > Ralph > >> On Jul 4, 2015, at 10:30 PM, Remko Popma <[email protected] >> <mailto:[email protected]>> wrote: >> >> Looks like we already have an outstanding Jira for this: LOG4J2-812 >> <https://issues.apache.org/jira/browse/LOG4J2-812>. >> If there are no objections I can take care of this one. >> >> On Sun, Jul 5, 2015 at 9:55 AM, Remko Popma <[email protected] >> <mailto:[email protected]>> wrote: >> One per logger context would make it easier to clear all ThreadContexts when >> a particular logger context is stopped. >> >> Sent from my iPhone >> >> On 2015/07/05, at 0:51, Gary Gregory <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Would there be one registry or one per logger context? >>> >>> Gary >>> >>> >>> -------- Original message -------- >>> From: Remko Popma <[email protected] <mailto:[email protected]>> >>> Date: 07/04/2015 05:41 (GMT-08:00) >>> To: Log4J Developers List <[email protected] >>> <mailto:[email protected]>> >>> Subject: Re: Log4j2 RollingFileAppender deadlock issue >>> >>> ThreadLocal is implemented as an internal Map in each Thread instance, so >>> there is constant lookup time regardless of the number of Threads and the >>> number of lookups. Contrast this with a lock, where performance will >>> decrease exponentially with more concurrent threads. >>> >>> (See also https://plumbr.eu/blog/java/how-is-threadlocal-implemented >>> <https://plumbr.eu/blog/java/how-is-threadlocal-implemented> ) >>> >>> Sent from my iPhone >>> >>> On 2015/07/04, at 20:40, Jess Holle <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>>> On 7/4/2015 2:51 AM, Gary Gregory wrote: >>>>> On Thu, Jul 2, 2015 at 6:18 AM, Remko Popma <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Yes, that could still work: We could use a ThreadLocal containing a >>>>> custom class which holds the lastTimestamp, cachedDateString as well as a >>>>> SimpleDateFormat instance. >>>>> >>>>> As Jess pointed out, we would also need a way to clear the ThreadLocal >>>>> when the LoggerContext is stopped (to prevent memory leaks in web apps). >>>>> This may be the third usage of ThreadLocals in log4j2 now, so it may be >>>>> worth creating a reusable mechanism for this. >>>>> One idea would be to have a ThreadLocal registry in the LoggerContext, >>>>> where the LoggerContext is responsible for cleaning up all registered >>>>> ThreadLocals in its stop() method. >>>>> >>>>> Thoughts? >>>>> >>>>> I'm wondering what the performance cost are of doing a ThreadLocal.get() >>>>> vs. synchronized(this) on each call to format(). >>>>> >>>> Personally I'd be less concerned with optimizing maximum logger throughput >>>> on any given thread than: >>>> Ensuring that not logging takes minimal time >>>> Minimizing potential thread contention >>>> Logging at maximum efficiency is a priority, but comes after these others. >>>> -- >>>> Jess Holle >>>> >> > >
