Re: Any plans on using apache Helix

Ted Dunning Sun, 28 Apr 2013 13:35:42 -0700

Inline.

On Sun, Apr 28, 2013 at 11:16 AM, kishore g <[email protected]> wrote:

> Its looks amazing, was looking for something like this. Couple of questions
>
> 1. How is the data replicated, is it writing synchronously to both primary
> and backups. What if backup is down
>

I think that the synchronous backup is configurable, not sure.

If the backup for a partition goes down, the partition is backed up
elsewhere.

> 2. what happens in network partition ?
>

You get split brain behavior.  This is actually very good for some
applications which are not the storage tier.  For instance, with
web-servers or Drill bits, split brain is actually better than going tharn.
 That way, the machines keep working and if the storage tier is, by some
miracle, still functional then all is well.  If the storage tier is not
well, then part of the split cluster will reflect that.

Another case where split brain is good is in message queuing where the
storage tier is local to the service receiving the messages.  It is good to
continue queuing messages during the split brain episode rather than
failing to accept them.

HazelCast has a mechanism for handling merging of split clusters, but I
would be very leery of expecting it to work correctly.  If you care about
consistency, then Zookeeper is a better model.

>
> Looks like it  dynamically distributes data based on the number of nodes in
> the system. I think in multicast it can discover other nodes, but what
> happens in tcp.
>

With TCP, you have to give a host name or address of at least one member of
the cluster.  You can configure things like IP address ranges and port
ranges.  Without multi-cast, HC will scan for live servers.  This isn't
quite zero-conf, but in the application I am building, it will use
multicast if you don't specify anything and if you give a host option it
will take a comma delimited list of hostnames or IP addresses and use all
of them.

> Does not look like its following any consensus protocol like paxos/zab. I
> just skimmed through the doc, could not get the internal details. Would
> love to know more about how it ensures data consistency.
>

It doesn't do much of that.

There is a way to define split/merge behavior but there is no effort to
provide strong consistency.

As I mentioned before, that is actually really, really good for many
applications.

The mission of HC is very different from the mission of ZK and each does a
different thing well.  For providing super simple out-of-the-box user
experience, HazelCast pretty much dominates any of the ZK based approaches.
 For providing absolute consistency, ZK totally dominates HC.

Thus, for something like Drillbits where I want them to do whatever they
can under all circumstances and where the primary goal is read access, I
would say Hazel is better.

For providing the ground truth information about where the CLDB master is
in the MapR file-system, I think that ZK is better.

Re: Any plans on using apache Helix

Reply via email to