-1 for removal of the log-forwarding which does not include an
out-of-the-box replacement. I'm going to avoid saying any more on this
unless necessary.
Most of the other things you bring up seem like you poking holes in how
the log aggregation works presently. I don't think anyone would disagree
that this could be made better. Instead of focusing on this, why not
approach your issues by instead suggesting how you'd like to see it work
and weigh the pros/cons?
Christopher wrote:
Currently, the monitor has an HA-standby behavior as a side-effect of the
way it handles logging.
The way this works is that the other services read the monitor's lock in ZK
to get the current active monitor's host and port. They use this to set
some system properties, which are referenced in the default example
generic_logger.xml config file. They set these system properties and reset
the log4j system whenever they detect a change in the active monitor. This
has the effect of forcing all the logging to go to a particular socket open
on the monitor, wherever it is currently running.
If you'd like to change the implementation, great.
As a point of reference, the Master's ZK lock is also used for service
discovery as well as distributed exclusion (a lock). I'm not sure why
you find an issue with this.
The ZK lock is currently being used to restrict monitor functionality to a
single monitor instance. But, this isn't really necessary. There isn't any
reason to restrict concurrent monitors. The real purpose of the ZK lock, as
I've described, is to hijack the ZK lock mechanism because it's also a
service-advertisement feature.
The use of the ZK lock is to prevent users from accessing an instance of
the Monitor which is not actually receiving the forwarded logs from the
server. It would be terrible if half of an Ops team saw no errors on the
monitor they went to while the other half saw the real errors.
This is a bit convoluted and makes a lot of assumptions, and has a lot of
issues. It is also could be impeding some possible avenues of
simplification under ACCUMULO-3005.
1. It locks us in to using Log4J (probably a specific range of versions).
2. It sends logs across the network insecurely.
These seem like implementation details to me.
3. It assumes that you only want a single monitor service running.
Again, an incorrect assumption.
4. Code assumes particular configuration with particular variable's
embedded in them.
Implementation detail
5. Extra threads needed to track changes
Implementation detail
6. It adds code complexity.
??? I don't even know how to address this one.
7. It assumes the user wishes to use the monitor to watch logs, when other
tools are better suited for log aggregation, monitoring, and alerting.
Your argument assumes that *all* users have such a capability set up.
This feature does not preclude users from doing whatever they'd like.
This whole thing would be simpler if we just eliminated the monitor
log-watching feature entirely. However, we also have some options short of
that. For example, we could (1) use a different service-advertisement
mechanism that doesn't lock.
Are you talking about using a persistent ZNode to advertise the Monitor
location? What happens when there would be multiple monitors?
(2). We could stop doing this variable
injection thing, and still leave the socket appender running, so that users
will have to configure their destination in their own configuration files,
if they want to emit logs to that socket. (3) We could use an Accumulo RPC
to send logs, rather than the log4j API. Lots of options.
Instead of #3, aren't there other "standard" log collection APIs that
would make more sense than us rolling our own?