[ 
https://issues.apache.org/jira/browse/OOZIE-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-615:
--------------------------------

    Attachment: OOZIE-615.patch

Attached patch.  
I believe the current plan is to not include HA in Oozie 4.0, so if we're going 
to use trunk for Oozie 4.0, then we shouldn't commit this until later; but we 
can still review it before then.  

Here's some more details about the implementation:
- Used Apache Curator for lock and service discovery implementations instead of 
re-inventing the wheel
- You enable HA by overriding 3 Services in oozie-site:
-# ZKLocksService
-#- Abstracted places where MemoryLocks.LockToken was being used to a new 
LockToken interface
-#- MemoryLocksService uses MemoryLocks.LockToken; ZKLocksService uses 
ZKLocksService.ZKLockToken
-# ZKXLogStreamingService
-#- Refactored XLogService into XLogService (handles log init stuff) and 
XLogStreamingService (handles log streaming)
-#- ZKXLogStreamingService, when asked for some logs, will:
-#-# contact each of the other Servers and ask for logs
-#-# collate and reorder the logs
-#-# return the logs back to the user as one response
-#- It's a best effort; if it can't get logs from a server, it will skip that 
server and add a note to the returned logs
-#- As mentioned in the pdf document, if a Server goes down, log messages may 
be unavailable
-#- To prevent infinite recursion on requesting logs, there's a new parameter 
"allservers" that if "false" will not ask other servers for logs; if omitted, 
its considered true so no change to the user
-# ZKJobsConcurrencyService
-#- For the few places where we want to only have one Server processing a 
specific job, ZKJobsConcurrencyService has some methods for determining if a 
job id "belongs" to a Server
-#- Added JobsConcurrencyService for non-HA operation, which has the same 
interface, but has dummy implementations (i.e. all jobs "belong" to the Server)
- ZKUtils
-- Manages connection to ZK and provides methods for interacting with ZK for 
ZKLocksService, ZKXLogStreamingService, and ZKJobsConcurrencyService
-- Uses a singleton so we don't have to worry about which of the 3 Services are 
created first and to disconnect properly when shutting down
-- Also advertises on the service discovery with ZK id for this Server and URL
--- The metadata is a hashmap so we can easily add additional properties later
- Added an admin command to the V2AdminServlet (and CLI) to return a list of 
available Oozie servers (i.e. servers connected to ZK); if not using HA, then 
just itself
- Added a way for Oozie to programaticly determine if it is running with HTTPS
- Three ZK properties in oozie-site/default:
-# {{oozie.zookeeper.connection.string}} : comma-separated list of host:ports 
for ZK servers
-# {{oozie.zookeeper.namespace}} : namespace to use; allows having multiple 
"sets" of Oozie Servers
-# {{oozie.zookeeper.oozie.id}} : ID that this Oozie server should use (default 
is hostname); each Server must have a unique ID
- Added ZKXTestCase, which extends XTestCase but runs a ZooKeeper server and 
has some other useful things

As far as testing goes, I wrote a bunch of unit tests and have tested running 3 
Oozies against mysql in a VM; but I haven't done any large-scale testing.  
                
> Support high availability for the Oozie service
> -----------------------------------------------
>
>                 Key: OOZIE-615
>                 URL: https://issues.apache.org/jira/browse/OOZIE-615
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Craig Peters
>            Assignee: Robert Kanter
>         Attachments: OOZIE-615.patch, OozieHADesign.pdf, 
> zookeeper_yahoo_code.zip
>
>
> As Oozie becomes a critical component in the Hadoop ecosystem users needs 
> assured availability of the services provided by Oozie.  To support this need 
> Oozie should include a new feature to support high availability.  This 
> feature needs to take into consideration that Oozie provides RESTful APIs, 
> Java APIs, and a command line API that should all be insensitive to the 
> availability of any specific server or components.  At Yahoo! it is not 
> required that there be session fail-over from the client.  It is acceptable 
> for the client to reconnect if a session is lost as long as the state data 
> managed by the Oozie service is not lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to