[ https://issues.apache.org/jira/browse/OOZIE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784482#comment-13784482 ]
Robert Kanter commented on OOZIE-1561: -------------------------------------- Alternatively, instead of pinging to determine up status, we can have a background thread doing a heartbeat that would update a timestamp in the ZK info for that server; other servers could assume that a server is down if it's timestamp is "too old". > When using Oozie HA, the logs should also be HA > ----------------------------------------------- > > Key: OOZIE-1561 > URL: https://issues.apache.org/jira/browse/OOZIE-1561 > Project: Oozie > Issue Type: Improvement > Components: HA > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Priority: Critical > > Currently, if an Oozie server goes down, the logs from that server become > unavailable until the server comes back up. In the meantime, the user may or > may not be aware that log messages could be missing when Oozie streams logs > to the user. > We should come up with a way to make the logs HA. > Some ideas: > # When rolling the logs, copy them into HDFS; Oozie servers can then read the > log files directly from HDFS instead of each other > #- The downside to this is that there will be a window where logs could still > be missing as they only show up in HDFS after rolling over (default = 1hr) > and Oozie servers would still have to contact each other for the last hour of > logs > #- The upside is that it minimizes the amount of logs that could be missing > and would be fairly straightforward to implement > # Log directly to HDFS > #- The downside is that this may be complicated or tricky to get working > properly > #-- This also introduces a strict dependency on HDFS > #- The upside is that this would completely solve the issue and Oozie servers > would simply get all logs directly from HDFS > # Log to ZooKeeper or a database > #- I think the log files will be too big to do this > I've assigned this to myself, but if someone wants to tackle this, feel free > to reassign it. I think idea 2 is the most practical, but I'm also open to > other ideas on how to do this. -- This message was sent by Atlassian JIRA (v6.1#6144)