[oops, I originally sent this reply direct to Johannes. Copying again the list]

On 04/09/2008, at 4:19 PM, Johannes Gutleber wrote:

Dear Paul,

On Sep 4, 2008, at 03:31 , Paul Smith wrote:

If anyone out there is using "Logging over the network" of any form (socket, JMS, Multicast, syslog appenders etc), this topic is for you. I'm wondering whether people could comment on their experience setting up high performance logging of application data over the network. The cost of shipping an event over the wire compared with writing it to a local file is obviously higher, and so one can notice user-visible impact if the logging volume is high when using network-based logging (unless one wraps it in AsyncAppenders).

In fact at CERN we're working with a 2000 computer, 10'000 applications systems now for about 2 years in which we deploy also logging over the network using a
combination of log4cplus and log4j using the XML data format.


fantastic. awesome to hear about this sort of environment. Really appreciate you taking the time to respond in depth here.


The experience with the tools at hand is rather disappointing in that the problem is not the log emitter, but the task of log collection. For that purpose we use a JAVA/Tomcat based log collector that has the capability to forward the collected logs to multiple outputs: SocketHub appender, OracleDB, JMS, etc. The collector was supposed to implement filtering and prioritizing capabilities, but those have practically all been dropped since they just added to the performance bottleneck instead of avoiding it (we experienced that Java is simply not performant enough to do this job for a size of system that we have). People went on about a hierarchical layout of log collectors, but that would not resolve the bottlenecks.


Are you able to articulate where you thought the bottleneck within the Java process was ? What JRE were you using? Was it GC overhead? Was it log4j itself, or the custom built collector that was the problem?


To that log collector we attach JMS subscribers as well as Chainsaw log viewers for online viewing.

Both the log-collector and the Chainsaw claints fail frequently due to performance problems of Java and extensive memory usage.


Which version of Chainsaw did you use? Was it the one inbuilt into log4j, or the newer Chainsaw v2? The latter uses a cyclic buffer so as not to consume too much memory. I do agree though that even Chainsaw v2 is far from perfect.. :)


So finally after about 2.5 years we stepped back to write to local files and collect/inspect them when needed. We replaced on-line logging with an error propagation system based on an XML protocol and WS-Eventing publisher subscriber. All implemented in C++, no Java anywhere. A testing phase of 2 month has proven successful and we
deploy the system since 1 month without major problems.

Of course my mail reflects the situation of a very peculiar, large size distributed soft-real time system that is probably not so common in other fields. In addition we wanted to use on-line log viewing and analysis, which is also not the common case in other domains.


My vision, ridiculous as this may sound, is that log4j should be able to support an environment in the new cloud environments. Imagine the Hadoop clusters that are out there and making sense of wtf happened during the dissemination of the MapReduce portions, with thousands of host nodes logging and, somehow, an engineer able to 'see' what happened on his job centrally (even if it's not real time).

With a Many->1 collector the collector host is going to have be one grunty box....

Pinpoints current design uses ActiveMQ internally. The Receivers inside accept the remote event and place it on the local log4j bus (just like it was placed on the remote bus that went to the network appender). Internally a local 'appender' routes the event to an internal JMS topic, with ActiveMQ buffering the events to a local persistent store temporarily. A single topic listener then tries to chew through the received events to index them (indexing is always going to be expensive hence a producer/consumer pattern). This temporary JMS buffer I am hoping will allow the receiving of events much faster than they can be consumed which is sort of what JMS (and ActiveMQ) is designed to handle, although it'll be interesting to see how it translate into practice. Where I work that is driving me to develop Pinpoint has no where near the load environment that yours has though.

I'd love to hear some actual logging numbers, that is, how many events (log lines would do) that each host on average is producing and their combined total (even a peak load for a given hour or something would be interesting). Your environment really is exactly what I had in mind to support with Pinpoint, and I'm very much focussed on making it as non-intrusive as possible.

cheers,

Paul Smith


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to