It may be possible (but a bit of a hack) to use async loggers without reflection. It involves compiling your own version of the com.lmax.disruptor.util.Util class where you replace the implementation of the #getUnsafe() method with a simple call to sun.misc.Unsafe.getUnsafe(). Obviously this is not pretty and there are no guarantees that you won't get a SecurityException or so next... :-(
You may be better off using AsyncAppender if you want your blocking I/O to happen in another thread. This may still be beneficial even on a one-core machine. Back to the difference between log4j-1.2 (20% CPU) and log4j-2.0 (40% CPU). Are the test programs identical? I remember you mentioned below that you use MDC with 2.0 but not with 1.2. Can you give a sample of your log4j usage with 1.2 and with 2.0? (Maybe create a JIRA and attach files to it?)
