[
https://issues.apache.org/jira/browse/OOZIE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073795#comment-14073795
]
Robert Kanter commented on OOZIE-1561:
--------------------------------------
The new approach sounds fine to me. A couple of things that we should think
about:
1) Would it make sense to also do the log streaming behavior to cover the gap
left by the fault tolerance window? This should still be faster than the
current behavior because each server only has to look fault tolerance minutes
in the past at most; the rest would come from HDFS. I don't think this has to
be decided now, but it might be good to have.
2) It would also be nice if we could remove a bunch of the restrictions we put
on the log4j config and allow users to use other appenders etc. It sounds like
we might be able to do that with these changes.
3) It can still be possible to be missing logs with this approach. For
example, suppose the fault tolerance window is 10min and a server goes down
before it has a chance to write the last 10min of logs to HDFS; now we're
missing some logs. Though this is better in a lot of ways than the current
behavior.
It would be good to get some input from others. [~chitnis], [~puru]?
> When using Oozie HA, the logs should also be HA
> -----------------------------------------------
>
> Key: OOZIE-1561
> URL: https://issues.apache.org/jira/browse/OOZIE-1561
> Project: Oozie
> Issue Type: Improvement
> Components: HA
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Bowen Zhang
> Priority: Critical
> Attachments: OozielogHAtechnicaldesigndoc.pdf,
> OozielogHAtechnicaldesigndoc.pdf
>
>
> Currently, if an Oozie server goes down, the logs from that server become
> unavailable until the server comes back up. In the meantime, the user may or
> may not be aware that log messages could be missing when Oozie streams logs
> to the user.
> We should come up with a way to make the logs HA.
> Some ideas:
> # When rolling the logs, copy them into HDFS; Oozie servers can then read the
> log files directly from HDFS instead of each other
> #- The downside to this is that there will be a window where logs could still
> be missing as they only show up in HDFS after rolling over (default = 1hr)
> and Oozie servers would still have to contact each other for the last hour of
> logs
> #- The upside is that it minimizes the amount of logs that could be missing
> and would be fairly straightforward to implement
> # Log directly to HDFS
> #- The downside is that this may be complicated or tricky to get working
> properly
> #-- This also introduces a strict dependency on HDFS
> #- The upside is that this would completely solve the issue and Oozie servers
> would simply get all logs directly from HDFS
> # Log to ZooKeeper or a database
> #- I think the log files will be too big to do this
> I've assigned this to myself, but if someone wants to tackle this, feel free
> to reassign it. I think idea 2 is the most practical, but I'm also open to
> other ideas on how to do this.
--
This message was sent by Atlassian JIRA
(v6.2#6252)