[ https://issues.apache.org/jira/browse/OOZIE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783504#comment-13783504 ]
Robert Kanter commented on OOZIE-1561: -------------------------------------- It currently does have a message if it can't get the logs from a server, and lists that server or servers. However, this only occurs if that problematic server is still in the service discovery on ZK; this only happens for a short period of time before its removed. After that, Oozie wouldn't know if its missing anything. I suppose we could change this behavior, but that will add some additional complexity to the service discovery. Once a server registers with ZK, it would be there forever, so we'd have to add a command for an admin to remove a stale server entry, or for a server to rename itself, etc. That seems like it would be more brittle; thoughts? Regardless, it might be a good idea to see if we can have the service stay in zookeeper for longer than it does now before being removed (there's probably a Curator/ZK setting for this); this would allow the "logs may be missing" message to stay around longer. We could also update the existing "server" CLI command to not only print the list of servers but to also ping each server so it can give a status (e.g. "UP", "DOWN"). Thoughts on these two ideas? > When using Oozie HA, the logs should also be HA > ----------------------------------------------- > > Key: OOZIE-1561 > URL: https://issues.apache.org/jira/browse/OOZIE-1561 > Project: Oozie > Issue Type: Improvement > Components: HA > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Priority: Critical > > Currently, if an Oozie server goes down, the logs from that server become > unavailable until the server comes back up. In the meantime, the user may or > may not be aware that log messages could be missing when Oozie streams logs > to the user. > We should come up with a way to make the logs HA. > Some ideas: > # When rolling the logs, copy them into HDFS; Oozie servers can then read the > log files directly from HDFS instead of each other > #- The downside to this is that there will be a window where logs could still > be missing as they only show up in HDFS after rolling over (default = 1hr) > and Oozie servers would still have to contact each other for the last hour of > logs > #- The upside is that it minimizes the amount of logs that could be missing > and would be fairly straightforward to implement > # Log directly to HDFS > #- The downside is that this may be complicated or tricky to get working > properly > #-- This also introduces a strict dependency on HDFS > #- The upside is that this would completely solve the issue and Oozie servers > would simply get all logs directly from HDFS > # Log to ZooKeeper or a database > #- I think the log files will be too big to do this > I've assigned this to myself, but if someone wants to tackle this, feel free > to reassign it. I think idea 2 is the most practical, but I'm also open to > other ideas on how to do this. -- This message was sent by Atlassian JIRA (v6.1#6144)