[
https://issues.apache.org/jira/browse/OOZIE-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-615:
--------------------------------
Attachment: OOZIE-615.patch
Attached patch.
I believe the current plan is to not include HA in Oozie 4.0, so if we're going
to use trunk for Oozie 4.0, then we shouldn't commit this until later; but we
can still review it before then.
Here's some more details about the implementation:
- Used Apache Curator for lock and service discovery implementations instead of
re-inventing the wheel
- You enable HA by overriding 3 Services in oozie-site:
-# ZKLocksService
-#- Abstracted places where MemoryLocks.LockToken was being used to a new
LockToken interface
-#- MemoryLocksService uses MemoryLocks.LockToken; ZKLocksService uses
ZKLocksService.ZKLockToken
-# ZKXLogStreamingService
-#- Refactored XLogService into XLogService (handles log init stuff) and
XLogStreamingService (handles log streaming)
-#- ZKXLogStreamingService, when asked for some logs, will:
-#-# contact each of the other Servers and ask for logs
-#-# collate and reorder the logs
-#-# return the logs back to the user as one response
-#- It's a best effort; if it can't get logs from a server, it will skip that
server and add a note to the returned logs
-#- As mentioned in the pdf document, if a Server goes down, log messages may
be unavailable
-#- To prevent infinite recursion on requesting logs, there's a new parameter
"allservers" that if "false" will not ask other servers for logs; if omitted,
its considered true so no change to the user
-# ZKJobsConcurrencyService
-#- For the few places where we want to only have one Server processing a
specific job, ZKJobsConcurrencyService has some methods for determining if a
job id "belongs" to a Server
-#- Added JobsConcurrencyService for non-HA operation, which has the same
interface, but has dummy implementations (i.e. all jobs "belong" to the Server)
- ZKUtils
-- Manages connection to ZK and provides methods for interacting with ZK for
ZKLocksService, ZKXLogStreamingService, and ZKJobsConcurrencyService
-- Uses a singleton so we don't have to worry about which of the 3 Services are
created first and to disconnect properly when shutting down
-- Also advertises on the service discovery with ZK id for this Server and URL
--- The metadata is a hashmap so we can easily add additional properties later
- Added an admin command to the V2AdminServlet (and CLI) to return a list of
available Oozie servers (i.e. servers connected to ZK); if not using HA, then
just itself
- Added a way for Oozie to programaticly determine if it is running with HTTPS
- Three ZK properties in oozie-site/default:
-# {{oozie.zookeeper.connection.string}} : comma-separated list of host:ports
for ZK servers
-# {{oozie.zookeeper.namespace}} : namespace to use; allows having multiple
"sets" of Oozie Servers
-# {{oozie.zookeeper.oozie.id}} : ID that this Oozie server should use (default
is hostname); each Server must have a unique ID
- Added ZKXTestCase, which extends XTestCase but runs a ZooKeeper server and
has some other useful things
As far as testing goes, I wrote a bunch of unit tests and have tested running 3
Oozies against mysql in a VM; but I haven't done any large-scale testing.
> Support high availability for the Oozie service
> -----------------------------------------------
>
> Key: OOZIE-615
> URL: https://issues.apache.org/jira/browse/OOZIE-615
> Project: Oozie
> Issue Type: New Feature
> Reporter: Craig Peters
> Assignee: Robert Kanter
> Attachments: OOZIE-615.patch, OozieHADesign.pdf,
> zookeeper_yahoo_code.zip
>
>
> As Oozie becomes a critical component in the Hadoop ecosystem users needs
> assured availability of the services provided by Oozie. To support this need
> Oozie should include a new feature to support high availability. This
> feature needs to take into consideration that Oozie provides RESTful APIs,
> Java APIs, and a command line API that should all be insensitive to the
> availability of any specific server or components. At Yahoo! it is not
> required that there be session fail-over from the client. It is acceptable
> for the client to reconnect if a session is lost as long as the state data
> managed by the Oozie service is not lost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira