[ https://issues.apache.org/jira/browse/OOZIE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796335#comment-13796335 ]
Robert Kanter commented on OOZIE-1561: -------------------------------------- {quote} I'm not sure if this really helps us, but we technically only need to make the log messages that have a job ID in them HA because other log messages would never be asked for. {quote} Had another idea: - Log messages that contain a job ID go to their own log file (and also to the oozie.log file?) and all other log messages go to the oozie.log file -- Because each job id has its own log file now, we don't need all of the messy log filtering code and file scanning etc that we've been doing to support log streaming -- We can use a standard logger (and also allow the user to use whatever they want) instead of the custom Oozie one for oozie.log -- We can create a new custom logger for writing the job logs to HDFS -- job logs can get cleaned up by the PurgeService when they get cleaned up from the DB - With one Oozie server, we can simply stream the log file for the job directly without extra processing - With Oozie HA, we can also stream; but we can also add the option to log job logs directly to HDFS -- We can say that HDFS HA is a prerequisite, so we can make the assumption that its always available -- Oozie servers shouldn't conflict with writing to job log files and should write in the correct order because they had to acquire the lock for that job id anyway -- When this is enabled, Oozie servers don't have to stream logs from each other at all Thoughts? > When using Oozie HA, the logs should also be HA > ----------------------------------------------- > > Key: OOZIE-1561 > URL: https://issues.apache.org/jira/browse/OOZIE-1561 > Project: Oozie > Issue Type: Improvement > Components: HA > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Priority: Critical > > Currently, if an Oozie server goes down, the logs from that server become > unavailable until the server comes back up. In the meantime, the user may or > may not be aware that log messages could be missing when Oozie streams logs > to the user. > We should come up with a way to make the logs HA. > Some ideas: > # When rolling the logs, copy them into HDFS; Oozie servers can then read the > log files directly from HDFS instead of each other > #- The downside to this is that there will be a window where logs could still > be missing as they only show up in HDFS after rolling over (default = 1hr) > and Oozie servers would still have to contact each other for the last hour of > logs > #- The upside is that it minimizes the amount of logs that could be missing > and would be fairly straightforward to implement > # Log directly to HDFS > #- The downside is that this may be complicated or tricky to get working > properly > #-- This also introduces a strict dependency on HDFS > #- The upside is that this would completely solve the issue and Oozie servers > would simply get all logs directly from HDFS > # Log to ZooKeeper or a database > #- I think the log files will be too big to do this > I've assigned this to myself, but if someone wants to tackle this, feel free > to reassign it. I think idea 2 is the most practical, but I'm also open to > other ideas on how to do this. -- This message was sent by Atlassian JIRA (v6.1#6144)