Hi, (A bit late reply).
Thanx for the response. This pretty much answers my doubts. I think hedwig has lot of potential in terms of a reliable message bus/publish subscribe system in the hadoop world which works across data centers (which most of the open source products don't consider in the first cut). More documentation and performance numbers would definitely help in understanding it better. -regards Amit ----- Original Message ---- From: Erwin Tam <e...@yahoo-inc.com> To: zookeeper-user@hadoop.apache.org Sent: Fri, 5 November, 2010 4:12:39 AM Subject: Re: Questions about Hedwig architecture Hi Amit, sorry for the late responses to your Hedwig questions. I just joined the mailing list so hopefully I can shed some light on things. First, here's a response to a question you posted last month that I don't think has been answered > In hedwig, how many publishers can publish to a particular topic simultaneously. > Any concerns / important points on message ordering? You can have as many concurrent publishers publishing to the same topic. The guarantee we make is that for each publisher, the messages are in the same relative order they were published in. They can be interleaved with other publishers though. Here are the responses to your other questions. 1. Yes, each of the Hedwig server hubs has a RegionManager module that implements the SubscriptionEventListener interface. So when a hub's SubscriptionManager takes ownership for a topic the very first time, it can invoke all of the listeners subscribed. The RegionManager will then look up the list of all other regions it knows about (stored in the ServerConfiguration), and subscribe to that topic using the "special" topic subscriber name reserved for hub subscribers. 2. As described above in #1, the Subscription Manager is the one that knows when the hub takes ownership of a topic. If subscriber X and Y both simultaneously subscribe to a topic that the hub was previously not the master of, the first subscribe call will cause the hub to become the topic owner/master and trigger all of the registered SubscriptionEventListeners. The second subscriber's call will not trigger this anymore since the hub now has at least one local subscriber for that topic it has just taken ownership of. 3. Region A will have a hub that itself is a subscriber to topics in all other regions. So in this case, Subscriber X in Region A is interested in topic T. The hub responsible for topic T in Region A will be a remote subscriber to topic T in Region B. If messages are published to topic T in Region B, they are first sent to all the local subscribers there and then delivered to all remote subscribers (actually no real ordering). The hub in Region A will receive the messages (like any normal subscriber) and persist them via the PersistenceManager used. Later on, the hub DeliveryManager will pick up that message and then push it to all local subscribers. There is logic to prevent it from being delivered to the RegionManager's hub subscription since it is the one that "got" the message remotely in the first place. 4. The proxy stuff was an initial attempt to create something akin to a web service call for C++ clients before we implemented a native Hedwig C ++ client. We didn't want to reproduce the logic in the java client so the easiest solution would be a proxy type of service that the C++ clients can call which in turn will just call the java client stuff. This is so the client apps that use Hedwig don't have to run a jvm on their box. 5. When a Hedwig hub dies, the order of operations is this. First off, all of the hubs are registered with ZK as ephemeral nodes to know which are the "live" boxes. If one dies, the ephemeral node goes away and ZK does not consider that hub as a live one. There is logic in the client that caches where it thinks a certain topic's hub master is. If it publishes to that hub directly and gets an error, it will redo the initial "handshake" protocol where it publishes to the "default server host" first. The default server host was designed to be a vip that fronted all of the hubs. The idea is that it will just pick one of the live hubs at random. So the client's publish request goes to a random hub where it sees a publish request for a topic. This random hub checks if it is the topic owner, if not, then it looks up who the topic owner is in ZK. If nobody is the topic owner (topic is up for grabs now since the last topic owner hub died), a random one out of all the live hubs is chosen (weighted by each server's load). A redirect response is sent back to the client (unless that particular hub is randomly chosen to be the topic owner). The client will then send the publish request to the redirected hub, and if successful, cache that locally as the hub that is the owner for the given topic. 6. The current design is that every Hedwig hub knows two things. First is where the local ZK server or quorum is (in case we're running replicated ZK). Second, it knows where the default server host/VIP for the hubs in all other regions are. This is stored in the ServerConfiguration. ZK will then be the central place that knows about all the Hedwig hubs in a given region. It knows what topics there are, who is the master of which topic, what load each of the hubs currently has, etc.. Note that it doesn't know anything about other regions. The Hedwig client knows only one thing, where the region's default server host is. This is so it can contact one of the hubs which in turn can redirect it to whoever the appropriate hub is that owns the topic. We didn't want the clients to directly contact ZK for this info to reduce load on the ZK server quorum. It also reduces some complexity on the client so we don't have to worry about caching the server hubs info, staleness, and a refresh policy. The only point of needing a VIP is not as a load balancer but as a hardware way of knowing which server hubs are still alive. We have a few ideas to make this part of the setup configuration easier and more elegant since not everyone has access to a hardware VIP server. Hope this makes sense! Erwin On Thu, 2010-11-04 at 08:02 -0700, Mahadev Konar wrote: > Flavio, Ben, Adam, Erwin? > > Can you guys please respond on zookeeper-user mailing list? > > Thanks > mahadev > > > On 11/2/10 11:06 PM, "amit jaiswal" <amit_...@yahoo.com> wrote: > > > Hi, > > > > I am trying to understand hedwig. I tried reading the documentation user.txt, > > dev.txt along with the code but > > still some design aspects are not clear. > > > > Can someone please tell the following: > > > > (Lets say there are 2 regions A and B) > > > > 1. When a subscriber X subscribes to topic T in region A, then does > > RegionManager automatically adds a subscription (with id = __A) to topic T in > > B. > > > > The RegionManager class has couple of callbacks and I was not able to > > understand > > > > it properly. > > > > 2. What happens when X and Y in region A subscribe to topic T. Does > > RegionManager tries to do separate subscription for X and Y in B? Since the > > RegionManager uses a static subscriber Id, the second subscription request > > will > > be considered duplicate. > > > > 3. How does X gets messages from region B? The RegionManager callbacks are >bit > > confusing and I was not able to understand. > > > > 4. What is the purpose of org.apache.hedwig.server.proxy package classes > > (HedwigProxy etc.). There is no documentation to explain the same. > > > > 5. What happens when one of the hub dies. The publisher will try to contact > > another hub? But what about the subscribers? Do they need to do any error > > handling / recovery? > > > > 6. Hedwig architecture mandates the need for a load balancer. As per my > > understanding it is required because the zk instances of different regions is > > not shared. I would expect all hosts information to be maintained in zk, and > > even for cross colo, the information should be shared through zk (may be that > > requires SSL support in zk). > > > > -regards > > Amit > > > > >