[oops, I originally sent this reply direct to Johannes. Copying again
the list]
On 04/09/2008, at 4:19 PM, Johannes Gutleber wrote:
Dear Paul,
On Sep 4, 2008, at 03:31 , Paul Smith wrote:
If anyone out there is using "Logging over the network" of any form
(socket, JMS, Multicast, syslog appenders etc), this topic is for
you. I'm wondering whether people could comment on their experience
setting up high performance logging of application data over the
network. The cost of shipping an event over the wire compared with
writing it to a local file is obviously higher, and so one can
notice user-visible impact if the logging volume is high when using
network-based logging (unless one wraps it in AsyncAppenders).
In fact at CERN we're working with a 2000 computer, 10'000
applications systems now for about 2 years in which we deploy also
logging over the network using a
combination of log4cplus and log4j using the XML data format.
fantastic. awesome to hear about this sort of environment. Really
appreciate you taking the time to respond in depth here.
The experience with the tools at hand is rather disappointing in
that the problem is not the log emitter, but the task of log
collection. For that purpose we use a JAVA/Tomcat based log
collector that
has the capability to forward the collected logs to multiple
outputs: SocketHub appender, OracleDB, JMS, etc. The collector was
supposed to implement filtering and prioritizing capabilities, but
those have practically all been dropped since they just added to the
performance bottleneck instead of avoiding it (we experienced that
Java is simply not performant enough to do this job
for a size of system that we have). People went on about a
hierarchical layout of log collectors, but that would not resolve
the bottlenecks.
Are you able to articulate where you thought the bottleneck within the
Java process was ? What JRE were you using? Was it GC overhead? Was
it log4j itself, or the custom built collector that was the problem?
To that log collector we attach JMS subscribers as well as Chainsaw
log viewers for online viewing.
Both the log-collector and the Chainsaw claints fail frequently due
to performance problems of Java and extensive memory usage.
Which version of Chainsaw did you use? Was it the one inbuilt into
log4j, or the newer Chainsaw v2? The latter uses a cyclic buffer so
as not to consume too much memory. I do agree though that even
Chainsaw v2 is far from perfect.. :)
So finally after about 2.5 years we stepped back to write to local
files and collect/inspect them when needed. We replaced on-line
logging with an error propagation system
based on an XML protocol and WS-Eventing publisher subscriber. All
implemented in C++, no Java anywhere. A testing phase of 2 month
has proven successful and we
deploy the system since 1 month without major problems.
Of course my mail reflects the situation of a very peculiar, large
size distributed soft-real time system that is probably not so
common in other fields.
In addition we wanted to use on-line log viewing and analysis, which
is also not the common case in other domains.
My vision, ridiculous as this may sound, is that log4j should be able
to support an environment in the new cloud environments. Imagine the
Hadoop clusters that are out there and making sense of wtf happened
during the dissemination of the MapReduce portions, with thousands of
host nodes logging and, somehow, an engineer able to 'see' what
happened on his job centrally (even if it's not real time).
With a Many->1 collector the collector host is going to have be one
grunty box....
Pinpoints current design uses ActiveMQ internally. The Receivers
inside accept the remote event and place it on the local log4j bus
(just like it was placed on the remote bus that went to the network
appender). Internally a local 'appender' routes the event to an
internal JMS topic, with ActiveMQ buffering the events to a local
persistent store temporarily. A single topic listener then tries to
chew through the received events to index them (indexing is always
going to be expensive hence a producer/consumer pattern). This
temporary JMS buffer I am hoping will allow the receiving of events
much faster than they can be consumed which is sort of what JMS (and
ActiveMQ) is designed to handle, although it'll be interesting to see
how it translate into practice. Where I work that is driving me to
develop Pinpoint has no where near the load environment that yours has
though.
I'd love to hear some actual logging numbers, that is, how many events
(log lines would do) that each host on average is producing and their
combined total (even a peak load for a given hour or something would
be interesting). Your environment really is exactly what I had in
mind to support with Pinpoint, and I'm very much focussed on making it
as non-intrusive as possible.
cheers,
Paul Smith
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]