Re: network logging performance

Paul Smith Thu, 04 Sep 2008 19:45:05 -0700

[oops, I originally sent this reply direct to Johannes. Copying againthe list]


On 04/09/2008, at 4:19 PM, Johannes Gutleber wrote:

Dear Paul,

On Sep 4, 2008, at 03:31 , Paul Smith wrote:
If anyone out there is using "Logging over the network" of any form(socket, JMS, Multicast, syslog appenders etc), this topic is foryou. I'm wondering whether people could comment on their experiencesetting up high performance logging of application data over thenetwork. The cost of shipping an event over the wire compared withwriting it to a local file is obviously higher, and so one cannotice user-visible impact if the logging volume is high when usingnetwork-based logging (unless one wraps it in AsyncAppenders).
In fact at CERN we're working with a 2000 computer, 10'000applications systems now for about 2 years in which we deploy alsologging over the network using a
combination of log4cplus and log4j using the XML data format.

fantastic. awesome to hear about this sort of environment. Reallyappreciate you taking the time to respond in depth here.

The experience with the tools at hand is rather disappointing inthat the problem is not the log emitter, but the task of logcollection. For that purpose we use a JAVA/Tomcat based logcollector thathas the capability to forward the collected logs to multipleoutputs: SocketHub appender, OracleDB, JMS, etc. The collector wassupposed to implement filtering and prioritizing capabilities, butthose have practically all been dropped since they just added to theperformance bottleneck instead of avoiding it (we experienced thatJava is simply not performant enough to do this jobfor a size of system that we have). People went on about ahierarchical layout of log collectors, but that would not resolvethe bottlenecks.

Are you able to articulate where you thought the bottleneck within theJava process was ? What JRE were you using? Was it GC overhead? Wasit log4j itself, or the custom built collector that was the problem?

To that log collector we attach JMS subscribers as well as Chainsawlog viewers for online viewing.
Both the log-collector and the Chainsaw claints fail frequently dueto performance problems of Java and extensive memory usage.

Which version of Chainsaw did you use? Was it the one inbuilt intolog4j, or the newer Chainsaw v2? The latter uses a cyclic buffer soas not to consume too much memory. I do agree though that evenChainsaw v2 is far from perfect.. :)

So finally after about 2.5 years we stepped back to write to localfiles and collect/inspect them when needed. We replaced on-linelogging with an error propagation systembased on an XML protocol and WS-Eventing publisher subscriber. Allimplemented in C++, no Java anywhere. A testing phase of 2 monthhas proven successful and we
deploy the system since 1 month without major problems.
Of course my mail reflects the situation of a very peculiar, largesize distributed soft-real time system that is probably not socommon in other fields.In addition we wanted to use on-line log viewing and analysis, whichis also not the common case in other domains.

My vision, ridiculous as this may sound, is that log4j should be ableto support an environment in the new cloud environments. Imagine theHadoop clusters that are out there and making sense of wtf happenedduring the dissemination of the MapReduce portions, with thousands ofhost nodes logging and, somehow, an engineer able to 'see' whathappened on his job centrally (even if it's not real time).

With a Many->1 collector the collector host is going to have be onegrunty box....

Pinpoints current design uses ActiveMQ internally. The Receiversinside accept the remote event and place it on the local log4j bus(just like it was placed on the remote bus that went to the networkappender). Internally a local 'appender' routes the event to aninternal JMS topic, with ActiveMQ buffering the events to a localpersistent store temporarily. A single topic listener then tries tochew through the received events to index them (indexing is alwaysgoing to be expensive hence a producer/consumer pattern). Thistemporary JMS buffer I am hoping will allow the receiving of eventsmuch faster than they can be consumed which is sort of what JMS (andActiveMQ) is designed to handle, although it'll be interesting to seehow it translate into practice. Where I work that is driving me todevelop Pinpoint has no where near the load environment that yours hasthough.

I'd love to hear some actual logging numbers, that is, how many events(log lines would do) that each host on average is producing and theircombined total (even a peak load for a given hour or something wouldbe interesting). Your environment really is exactly what I had inmind to support with Pinpoint, and I'm very much focussed on making itas non-intrusive as possible.


cheers,

Paul Smith


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: network logging performance

Reply via email to