Re: [DISCUSS] Rethinking monitor log receiver

Billie Rinaldi Tue, 07 Feb 2017 09:07:08 -0800

On Tue, Feb 7, 2017 at 8:40 AM, Christopher <[email protected]> wrote:


> On Tue, Feb 7, 2017 at 11:02 AM Billie Rinaldi <[email protected]>
> wrote:
>
> > I am a fan of the log-forwarding feature. I wouldn't mind if there were
> an
> > option to turn it off for users that don't want it. I also wouldn't mind
> if
> > someone wanted to improve how the log-forwarding mechanism is
> implemented.
> > Option (2) doesn't work for me, though. I need the log-forwarding
> > destination(s) to be discovered automatically rather than requiring it to
> > be known in advance / configured manually by the user.
> >
> >
> Using DNS (CNAME) for the monitor destination might be a better solution
> for auto-routing to the current monitor, rather than the auto-discovery
> happening inside Accumulo. The DNS solution won't work for random ports...
> but I'm not sure what the use is for a monitor that isn't served to a
> well-known location.


Past uses of auto-discovering a random port have been for running on YARN
and for ITs. I don't think this is necessary for YARN/Slider anymore.


> A custom appender which does the logic (rather than
> the standard socket appender, custom properties, and tserver logic) would
> also work, if the services are properly advertised.
>

I think a custom appender would be fine.


>
> What are your thoughts on these?
>
>
> > Seems like improving log forwarding is a separate (but related) issue
> from
> > whether we want to add the feature of supporting multiple monitors. To
> > implement that feature, we'd have to decide whether we'd want to forward
> to
> > all monitors, or just forward to the first monitor, or whatever. If we do
> > implement the multiple monitors feature, I think we should add a
> "Monitors"
> > tab to the monitor, so that users will be able to monitor their monitors.
> > :) (Seriously.)
> >
> >
> I'm leaning toward thinking that we shouldn't try to get into the business
> of supporting the logic of how many, and which ones, to forward to.
>
> From this discussion, I'm thinking:
>
> 1. Make the service discovery specifically for the log4j service, and make
> it optional. Turning off log4j service will also prevent registration in
> ZK. This would also make log4j an optional dependency of the monitor. When
> the feature is disabled, /log in the monitor will document that it is
> disabled.
>
2. Create a custom log4j appender which does the logic of looking up the
> advertised monitors, and decides whether to send to one or them all.
> 3. Disable all these by default with the example configs, but provide blog
> post explaining how to enable.
>

I'd prefer enabled by default. :)


>
> Does that sound reasonable?
>
>
> > On Mon, Feb 6, 2017 at 2:04 PM, Christopher <[email protected]> wrote:
> >
> > > On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <[email protected]>
> wrote:
> > >
> > > -1 for removal of the log-forwarding which does not include an
> > > out-of-the-box replacement. I'm going to avoid saying any more on this
> > > unless necessary.
> > >
> > >
> > > If we were to opt for removal (perhaps by agreeing that it were out of
> > > scope), I'd personally prefer it be phased out gradually, as users
> gained
> > > more experience integrating with superior tooling for distributed log
> > > aggregation. I think too many people rely on it right now to remove it
> > > abruptly.
> > >
> > >
> > > Most of the other things you bring up seem like you poking holes in how
> > > the log aggregation works presently. I don't think anyone would
> disagree
> > > that this could be made better. Instead of focusing on this, why not
> > > approach your issues by instead suggesting how you'd like to see it
> work
> > > and weigh the pros/cons?
> > >
> > >
> > > Yes, this discussion is about both whether, and how, we might be able
> to
> > > make it better. I haven't proposed a solution, because I don't have one
> > at
> > > this time. Hence, my desire to [DISCUSS] it.
> > >
> > >
> > > Christopher wrote:
> > > > Currently, the monitor has an HA-standby behavior as a side-effect of
> > the
> > > > way it handles logging.
> > > >
> > > > The way this works is that the other services read the monitor's lock
> > in
> > > ZK
> > > > to get the current active monitor's host and port. They use this to
> set
> > > > some system properties, which are referenced in the default example
> > > > generic_logger.xml config file. They set these system properties and
> > > reset
> > > > the log4j system whenever they detect a change in the active monitor.
> > > This
> > > > has the effect of forcing all the logging to go to a particular
> socket
> > > open
> > > > on the monitor, wherever it is currently running.
> > >
> > > If you'd like to change the implementation, great.
> > >
> > >
> > > Right. So, this is a discuss thread about how we might want to change
> > it. I
> > > don't have a good sense about what avenues are going to cause
> headaches.
> > >
> > >
> > > As a point of reference, the Master's ZK lock is also used for service
> > > discovery as well as distributed exclusion (a lock). I'm not sure why
> > > you find an issue with this.
> > >
> > >
> > > I don't find an issue with it. I recognize it does both. I stated
> that. I
> > > also recognize that it must do both for certain services. My point was
> > that
> > > only one of those aspects (service discovery), rather than both, was
> > needed
> > > for the monitor service. The other aspect (distributed exclusion) is an
> > > unnecessary side-effect which cripples redundant monitors because of
> this
> > > (optional) log receiving feature.
> > >
> > >
> > > > The ZK lock is currently being used to restrict monitor functionality
> > to
> > > a
> > > > single monitor instance. But, this isn't really necessary. There
> isn't
> > > any
> > > > reason to restrict concurrent monitors. The real purpose of the ZK
> > lock,
> > > as
> > > > I've described, is to hijack the ZK lock mechanism because it's also
> a
> > > > service-advertisement feature.
> > >
> > > The use of the ZK lock is to prevent users from accessing an instance
> of
> > > the Monitor which is not actually receiving the forwarded logs from the
> > > server. It would be terrible if half of an Ops team saw no errors on
> the
> > > monitor they went to while the other half saw the real errors.
> > >
> > >
> > > Yes. One solution would be to limit the distributed exclusion to just
> the
> > > log feature (along with any relevant messages), and leave the rest of
> the
> > > redundant monitor functional.
> > >
> > >
> > > > This is a bit convoluted and makes a lot of assumptions, and has a
> lot
> > of
> > > > issues. It is also could be impeding some possible avenues of
> > > > simplification under ACCUMULO-3005.
> > > >
> > > > 1. It locks us in to using Log4J (probably a specific range of
> > versions).
> > > > 2. It sends logs across the network insecurely.
> > >
> > > These seem like implementation details to me.
> > >
> > >
> > > Yes, but it's an inherent problem with the asynchronous log appender,
> > which
> > > speaks directly to its suitability for running a custom log receiver.
> > >
> > >
> > >
> > > > 3. It assumes that you only want a single monitor service running.
> > >
> > > Again, an incorrect assumption.
> > >
> > >
> > > It prevents you from running more than one concurrently. The incorrect
> > > assumption that you want this is occurring on the part of the current
> > > implementation, based on its fundamental design.
> > >
> > >
> > > > 4. Code assumes particular configuration with particular variable's
> > > > embedded in them.
> > >
> > > Implementation detail
> > >
> > >
> > > Well, yes... this thread is a discuss thread about changing
> > implementation
> > > in some way... so....
> > >
> > >
> > > > 5. Extra threads needed to track changes
> > >
> > >
> > >
> > > Implementation detail
> > >
> > >
> > > Yep.
> > >
> > >
> > >
> > > > 6. It adds code complexity.
> > >
> > > ??? I don't even know how to address this one.
> > >
> > >
> > > Responding to every point is optional. ;)
> > > But, seriously, my point here is that we often seem to throw the
> kitchen
> > > sink into our code, introducing unnecessary complexity, rather than
> > > encourage integration. This complexity makes integration harder, and
> the
> > > code less maintainable. While I don't yet know what my preferred
> solution
> > > is... I strongly suspect it is one that is both workable and less
> > complex.
> > >
> > >
> > > > 7. It assumes the user wishes to use the monitor to watch logs, when
> > > other
> > > > tools are better suited for log aggregation, monitoring, and
> alerting.
> > >
> > > Your argument assumes that *all* users have such a capability set up.
> > > This feature does not preclude users from doing whatever they'd like.
> > >
> > >
> > > It's not an argument. I'm not championing a position. That point was
> > simply
> > > an observation of the current default behavior, and the recognition
> that
> > > dedicated tools for this exist. I'm not assuming *any* users have any
> > > alternate capabilities set up, never mind "*all*". Whether or not users
> > > have such capabilities set up says nothing about whether we'd prefer to
> > > roll our own or help users integrate our product into alternatives. We
> > need
> > > not know about their capabilities in order to adopt a preferred
> > philosophy
> > > on this matter.
> > >
> > > And, it actually does preclude users from doing whatever they like. It
> > > imposes at least the following limitations on users: It's not possible
> > > (currently) for users to run the monitor without also listing on an
> > extra,
> > > insecure, port, awaiting receipt of arbitrary log data. It's not
> possible
> > > for users to run two monitors concurrently and get functionality out of
> > > both. It's not possible for users to avoid log system resets, which
> could
> > > be expensive (performance) and could cause log messages to be dropped
> > while
> > > it's reloading. It's not possible for users to be able to predict when
> > the
> > > logging configuration will be reloaded (and changes take effect) on the
> > > basis of some new instance of the monitor being started somewhere.
> > >
> > >
> > > > This whole thing would be simpler if we just eliminated the monitor
> > > > log-watching feature entirely. However, we also have some options
> short
> > > of
> > > > that. For example, we could (1) use a different service-advertisement
> > > > mechanism that doesn't lock.
> > >
> > > Are you talking about using a persistent ZNode to advertise the Monitor
> > > location? What happens when there would be multiple monitors?
> > >
> > >
> > > Not necessarily a persistent ZNode. We can advertise as many monitors
> as
> > we
> > > like, just as we currently do with tservers. This would make monitors
> > > available for log receipt, but the decision about who and how many to
> > send
> > > to could be left to the user, perhaps with a custom log appender which
> is
> > > configurable.
> > >
> > >
> > > (2). We could stop doing this variable
> > > > injection thing, and still leave the socket appender running, so that
> > > users
> > > > will have to configure their destination in their own configuration
> > > files,
> > > > if they want to emit logs to that socket. (3) We could use an
> Accumulo
> > > RPC
> > > > to send logs, rather than the log4j API. Lots of options.
> > >
> > > Instead of #3, aren't there other "standard" log collection APIs that
> > > would make more sense than us rolling our own?
> > >
> > >
> > >
> > > Certainly. This is not an exhaustive list of options, nor is it a
> > proposal
> > > to go with any one of them in particular. It's just a discussion. I'm
> not
> > > necessarily a fan of this option. It just occurred to me because I
> > thought
> > > we could provide the feature at the same level of security as the rest
> of
> > > our RPCs (for instance, when using TLS or KRB).
> > >
> > > --
> > > Christopher
> > >
> >
> --
> Christopher
>

Re: [DISCUSS] Rethinking monitor log receiver

Reply via email to