On Tue, Feb 7, 2017 at 8:40 AM, Christopher <[email protected]> wrote:
> On Tue, Feb 7, 2017 at 11:02 AM Billie Rinaldi <[email protected]> > wrote: > > > I am a fan of the log-forwarding feature. I wouldn't mind if there were > an > > option to turn it off for users that don't want it. I also wouldn't mind > if > > someone wanted to improve how the log-forwarding mechanism is > implemented. > > Option (2) doesn't work for me, though. I need the log-forwarding > > destination(s) to be discovered automatically rather than requiring it to > > be known in advance / configured manually by the user. > > > > > Using DNS (CNAME) for the monitor destination might be a better solution > for auto-routing to the current monitor, rather than the auto-discovery > happening inside Accumulo. The DNS solution won't work for random ports... > but I'm not sure what the use is for a monitor that isn't served to a > well-known location. Past uses of auto-discovering a random port have been for running on YARN and for ITs. I don't think this is necessary for YARN/Slider anymore. > A custom appender which does the logic (rather than > the standard socket appender, custom properties, and tserver logic) would > also work, if the services are properly advertised. > I think a custom appender would be fine. > > What are your thoughts on these? > > > > Seems like improving log forwarding is a separate (but related) issue > from > > whether we want to add the feature of supporting multiple monitors. To > > implement that feature, we'd have to decide whether we'd want to forward > to > > all monitors, or just forward to the first monitor, or whatever. If we do > > implement the multiple monitors feature, I think we should add a > "Monitors" > > tab to the monitor, so that users will be able to monitor their monitors. > > :) (Seriously.) > > > > > I'm leaning toward thinking that we shouldn't try to get into the business > of supporting the logic of how many, and which ones, to forward to. > > From this discussion, I'm thinking: > > 1. Make the service discovery specifically for the log4j service, and make > it optional. Turning off log4j service will also prevent registration in > ZK. This would also make log4j an optional dependency of the monitor. When > the feature is disabled, /log in the monitor will document that it is > disabled. > 2. Create a custom log4j appender which does the logic of looking up the > advertised monitors, and decides whether to send to one or them all. > 3. Disable all these by default with the example configs, but provide blog > post explaining how to enable. > I'd prefer enabled by default. :) > > Does that sound reasonable? > > > > On Mon, Feb 6, 2017 at 2:04 PM, Christopher <[email protected]> wrote: > > > > > On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <[email protected]> > wrote: > > > > > > -1 for removal of the log-forwarding which does not include an > > > out-of-the-box replacement. I'm going to avoid saying any more on this > > > unless necessary. > > > > > > > > > If we were to opt for removal (perhaps by agreeing that it were out of > > > scope), I'd personally prefer it be phased out gradually, as users > gained > > > more experience integrating with superior tooling for distributed log > > > aggregation. I think too many people rely on it right now to remove it > > > abruptly. > > > > > > > > > Most of the other things you bring up seem like you poking holes in how > > > the log aggregation works presently. I don't think anyone would > disagree > > > that this could be made better. Instead of focusing on this, why not > > > approach your issues by instead suggesting how you'd like to see it > work > > > and weigh the pros/cons? > > > > > > > > > Yes, this discussion is about both whether, and how, we might be able > to > > > make it better. I haven't proposed a solution, because I don't have one > > at > > > this time. Hence, my desire to [DISCUSS] it. > > > > > > > > > Christopher wrote: > > > > Currently, the monitor has an HA-standby behavior as a side-effect of > > the > > > > way it handles logging. > > > > > > > > The way this works is that the other services read the monitor's lock > > in > > > ZK > > > > to get the current active monitor's host and port. They use this to > set > > > > some system properties, which are referenced in the default example > > > > generic_logger.xml config file. They set these system properties and > > > reset > > > > the log4j system whenever they detect a change in the active monitor. > > > This > > > > has the effect of forcing all the logging to go to a particular > socket > > > open > > > > on the monitor, wherever it is currently running. > > > > > > If you'd like to change the implementation, great. > > > > > > > > > Right. So, this is a discuss thread about how we might want to change > > it. I > > > don't have a good sense about what avenues are going to cause > headaches. > > > > > > > > > As a point of reference, the Master's ZK lock is also used for service > > > discovery as well as distributed exclusion (a lock). I'm not sure why > > > you find an issue with this. > > > > > > > > > I don't find an issue with it. I recognize it does both. I stated > that. I > > > also recognize that it must do both for certain services. My point was > > that > > > only one of those aspects (service discovery), rather than both, was > > needed > > > for the monitor service. The other aspect (distributed exclusion) is an > > > unnecessary side-effect which cripples redundant monitors because of > this > > > (optional) log receiving feature. > > > > > > > > > > The ZK lock is currently being used to restrict monitor functionality > > to > > > a > > > > single monitor instance. But, this isn't really necessary. There > isn't > > > any > > > > reason to restrict concurrent monitors. The real purpose of the ZK > > lock, > > > as > > > > I've described, is to hijack the ZK lock mechanism because it's also > a > > > > service-advertisement feature. > > > > > > The use of the ZK lock is to prevent users from accessing an instance > of > > > the Monitor which is not actually receiving the forwarded logs from the > > > server. It would be terrible if half of an Ops team saw no errors on > the > > > monitor they went to while the other half saw the real errors. > > > > > > > > > Yes. One solution would be to limit the distributed exclusion to just > the > > > log feature (along with any relevant messages), and leave the rest of > the > > > redundant monitor functional. > > > > > > > > > > This is a bit convoluted and makes a lot of assumptions, and has a > lot > > of > > > > issues. It is also could be impeding some possible avenues of > > > > simplification under ACCUMULO-3005. > > > > > > > > 1. It locks us in to using Log4J (probably a specific range of > > versions). > > > > 2. It sends logs across the network insecurely. > > > > > > These seem like implementation details to me. > > > > > > > > > Yes, but it's an inherent problem with the asynchronous log appender, > > which > > > speaks directly to its suitability for running a custom log receiver. > > > > > > > > > > > > > 3. It assumes that you only want a single monitor service running. > > > > > > Again, an incorrect assumption. > > > > > > > > > It prevents you from running more than one concurrently. The incorrect > > > assumption that you want this is occurring on the part of the current > > > implementation, based on its fundamental design. > > > > > > > > > > 4. Code assumes particular configuration with particular variable's > > > > embedded in them. > > > > > > Implementation detail > > > > > > > > > Well, yes... this thread is a discuss thread about changing > > implementation > > > in some way... so.... > > > > > > > > > > 5. Extra threads needed to track changes > > > > > > > > > > > > Implementation detail > > > > > > > > > Yep. > > > > > > > > > > > > > 6. It adds code complexity. > > > > > > ??? I don't even know how to address this one. > > > > > > > > > Responding to every point is optional. ;) > > > But, seriously, my point here is that we often seem to throw the > kitchen > > > sink into our code, introducing unnecessary complexity, rather than > > > encourage integration. This complexity makes integration harder, and > the > > > code less maintainable. While I don't yet know what my preferred > solution > > > is... I strongly suspect it is one that is both workable and less > > complex. > > > > > > > > > > 7. It assumes the user wishes to use the monitor to watch logs, when > > > other > > > > tools are better suited for log aggregation, monitoring, and > alerting. > > > > > > Your argument assumes that *all* users have such a capability set up. > > > This feature does not preclude users from doing whatever they'd like. > > > > > > > > > It's not an argument. I'm not championing a position. That point was > > simply > > > an observation of the current default behavior, and the recognition > that > > > dedicated tools for this exist. I'm not assuming *any* users have any > > > alternate capabilities set up, never mind "*all*". Whether or not users > > > have such capabilities set up says nothing about whether we'd prefer to > > > roll our own or help users integrate our product into alternatives. We > > need > > > not know about their capabilities in order to adopt a preferred > > philosophy > > > on this matter. > > > > > > And, it actually does preclude users from doing whatever they like. It > > > imposes at least the following limitations on users: It's not possible > > > (currently) for users to run the monitor without also listing on an > > extra, > > > insecure, port, awaiting receipt of arbitrary log data. It's not > possible > > > for users to run two monitors concurrently and get functionality out of > > > both. It's not possible for users to avoid log system resets, which > could > > > be expensive (performance) and could cause log messages to be dropped > > while > > > it's reloading. It's not possible for users to be able to predict when > > the > > > logging configuration will be reloaded (and changes take effect) on the > > > basis of some new instance of the monitor being started somewhere. > > > > > > > > > > This whole thing would be simpler if we just eliminated the monitor > > > > log-watching feature entirely. However, we also have some options > short > > > of > > > > that. For example, we could (1) use a different service-advertisement > > > > mechanism that doesn't lock. > > > > > > Are you talking about using a persistent ZNode to advertise the Monitor > > > location? What happens when there would be multiple monitors? > > > > > > > > > Not necessarily a persistent ZNode. We can advertise as many monitors > as > > we > > > like, just as we currently do with tservers. This would make monitors > > > available for log receipt, but the decision about who and how many to > > send > > > to could be left to the user, perhaps with a custom log appender which > is > > > configurable. > > > > > > > > > (2). We could stop doing this variable > > > > injection thing, and still leave the socket appender running, so that > > > users > > > > will have to configure their destination in their own configuration > > > files, > > > > if they want to emit logs to that socket. (3) We could use an > Accumulo > > > RPC > > > > to send logs, rather than the log4j API. Lots of options. > > > > > > Instead of #3, aren't there other "standard" log collection APIs that > > > would make more sense than us rolling our own? > > > > > > > > > > > > Certainly. This is not an exhaustive list of options, nor is it a > > proposal > > > to go with any one of them in particular. It's just a discussion. I'm > not > > > necessarily a fan of this option. It just occurred to me because I > > thought > > > we could provide the feature at the same level of security as the rest > of > > > our RPCs (for instance, when using TLS or KRB). > > > > > > -- > > > Christopher > > > > > > -- > Christopher >
