eolivelli commented on a change in pull request #1073: ZOOKEEPER-3529: add a 
new doc: zookeeperUseCases.md
URL: https://github.com/apache/zookeeper/pull/1073#discussion_r324263731
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperUseCases.md
 ##########
 @@ -0,0 +1,377 @@
+<!--
+Copyright 2002-2019 The Apache Software Foundation
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+//-->
+
+# ZooKeeper Use Cases
+
+- Applications and organizations using ZooKeeper include (alphabetically)[1].
+- If your use case wants to be listed here. Please do not hesitate, submit a 
pull request or write an email to **d...@zookeeper.apache.org**,
+  and then, your use case will be included.
+
+
+## Free Software Projects
+
+### [AdroitLogic UltraESB](http://adroitlogic.org/)
+  - Uses ZooKeeper to implement node coordination, in clustering support. This 
allows the management of the complete cluster,
+  or any specific node - from any other node connected via JMX. A Cluster wide 
command framework developed on top of the
+  ZooKeeper coordination allows commands that fail on some nodes to be retried 
etc. We also support the automated graceful
+  round-robin-restart of a complete cluster of nodes using the same framework.
+
+### [Akka](http://akka.io/)
+  - Akka is the platform for the next generation event-driven, scalable and 
fault-tolerant architectures on the JVM.
+  Or: Akka is a toolkit and runtime for building highly concurrent, 
distributed, and fault tolerant event-driven applications on the JVM.
+
+### [Eclipse Communication Framework](http://www.eclipse.org/ecf) 
+  - The Eclipse ECF project provides an implementation of its Abstract 
Discovery services using Zookeeper. ECF itself
+  is used in many projects providing base functionallity for communication, 
all based on OSGi.
+
+### [Eclipse Gyrex](http://www.eclipse.org/gyrex)
+  - The Eclipse Gyrex project provides a platform for building your own Java 
OSGi based clouds. 
+  - ZooKeeper is used as the core cloud component for node membership and 
management, coordination of jobs executing among workers,
+  a lock service and a simple queue service and a lot more.
+
+### [GoldenOrb](http://www.goldenorbos.org/)
+  - massive-scale Graph analysis
+
+### [Juju](https://juju.ubuntu.com/)
+  - Service deployment and orchestration framework, formerly called Ensemble.
+
+### [Katta](http://katta.sourceforge.net/)
+  - Katta serves distributed Lucene indexes in a grid environment.
+  - Zookeeper is used for node, master and index management in the grid.
+
+### [KeptCollections](https://github.com/anthonyu/KeptCollections)
+  - KeptCollections is a library of drop-in replacements for the data 
structures in the Java Collections framework.
+  - KeptCollections uses Apache ZooKeeper as a backing store, thus making its 
data structures distributed and scalable.
+
+### [Neo4j](https://neo4j.com/)
+  - Neo4j is a Graph Database. It's a disk based, ACID compliant transactional 
storage engine for big graphs and fast graph traversals,
+    using external indicies like Lucene/Solr for global searches.
+  - We use ZooKeeper in the Neo4j High Availability components for 
write-master election,
+    read slave coordination and other cool stuff. ZooKeeper is a great and 
focused project - we like!
+
+### [Norbert](http://sna-projects.com/norbert)
+  - Partitioned routing and cluster management.
+
+### [OpenStack Nova](http://www.openstack.org/)
+  - OpenStack  is an open source software stack for the creation and 
management of private and public clouds. It is designed to manage pools of 
compute,
+    storage, and networking resources in data centers, allowing the management 
of these resources through a consolidated dashboard and flexible APIs.
+  - Nova is the software component in OpenStack, which is responsible for 
managing the compute resources, where virtual machines (VMs) are hosted in a 
cloud computing environment. It is also known as the OpenStack Compute Service. 
OpenStack
+    Nova provides a cloud computing fabric controller, supporting a wide 
variety of virtualization technologies such as KVM, Xen, VMware, and many more. 
In addition to its native API, it also includes compatibility with Amazon EC2 
and S3 APIs.
+    Nova depends on up-to-date information about the availability of the 
various compute nodes and services that run on them, for its proper operation. 
For example, the virtual machine placement operation requires to know the 
currently available compute nodes and their current state.
+    Nova uses ZooKeeper to implement an efficient membership service, which 
monitors the availability of registered services. This is done through the 
ZooKeeper ServiceGroup Driver, which works by using ZooKeeper's ephemeral 
znodes. Each service registers by creating an ephemeral znode on startup. Now, 
when the service dies, ZooKeeper will automatically delete the corresponding 
ephemeral znode. The removal of this znode can be used to trigger the 
corresponding recovery logic.
+    For example, when a compute node crashes, the nova-compute service that is 
running in that node also dies. This causes the session with ZooKeeper service 
to expire, and as a result, ZooKeeper deletes the ephemeral znode created by 
the nova-compute service. If the cloud controller keeps a watch on this 
node-deletion event, it will come to know about the compute node crash and can 
trigger a migration procedure to evacuate all the VMs that are running in the 
failed compute node to other nodes. This way, high availability of the VMs can 
be ensured in real time.
+  - ZooKeeper is also being considered for the following use cases in 
OpenStack Nova:
+    - Storing the configuration metadata (nova.conf)
+    - Maintaining high availability of services and automatic failover using
+    leader election
+
+### [spring-cloud-zookeeper](https://spring.io/projects/spring-cloud-zookeeper)
+  - Spring Cloud Zookeeper provides Apache Zookeeper integrations for Spring 
Boot apps through autoconfiguration
+    and binding to the Spring Environment and other Spring programming model 
idioms. With a few simple annotations
+    you can quickly enable and configure the common patterns inside your 
application and build large distributed systems with Zookeeper.
+    The patterns provided include Service Discovery and Distributed 
Configuration.
+
+### [Talend 
ESB](http://www.talend.com/products-application-integration/application-integration-esb-se.php)
+  - Talend ESB is a versatile and flexible, enterprise service bus.
+  - It uses ZooKeeper as endpoint repository of both REST and SOAP Web 
services.
+    By using ZooKeeper Talend ESB is able to provide failover and load 
balancing capabilities in a very light-weight manner
+
+### [redis_failover](https://github.com/ryanlecompte/redis_failover)
+  - Redis Failover is a ZooKeeper-based automatic master/slave failover 
solution for Ruby.
+
+
+## Apache Projects
+
+### [Apache Accumulo](https://accumulo.apache.org/)
+  - Accumulo is a distributed key/value store that provides expressive, 
cell-level access labels.
+  - Apache ZooKeeper plays a central role within the Accumulo architecture. 
Its quorum consistency model supports an overall
+    Accumulo architecture with no single points of failure. Beyond that, 
Accumulo leverages ZooKeeper to store and communication 
+    configuration information for users and tables, as well as operational 
states of processes and tablets.[2]
+
+### [Apache BookKeeper](https://bookkeeper.apache.org/)
+  - A scalable, fault-tolerant, and low-latency storage service optimized for 
real-time workloads.
+  - BookKeeper requires a metadata storage service to store information 
related to ledgers and available bookies. BookKeeper currently uses
+    ZooKeeper for this and other tasks[3].
+
+### [Apache CXF DOSGi](http://cxf.apache.org/distributed-osgi.html)
+  - Apache CXF is an open source services framework. CXF helps you build and 
develop services using frontend programming
+    APIs, like JAX-WS and JAX-RS. These services can speak a variety of 
protocols such as SOAP, XML/HTTP, RESTful HTTP,
+    or CORBA and work over a variety of transports such as HTTP, JMS or JBI.
+  - The Distributed OSGi implementation at Apache CXF uses ZooKeeper for its 
Discovery functionality.[4]
+
+### [Apache Dubbo](http://dubbo.apache.org)
+  - Apache Dubbo is a high-performance, java based open source RPC framework.
+  - Zookeeper is used for service registration discovery and configuration 
management in Dubbo.[6]
+
+### [Apache Flink](https://flink.apache.org/)
+  - Apache Flink is a framework and distributed processing engine for stateful 
computations over unbounded and bounded data streams.
+    Flink has been designed to run in all common cluster environments, perform 
computations at in-memory speed and at any scale.
+  - To enable JobManager High Availability you have to set the 
high-availability mode to zookeeper, configure a ZooKeeper quorum and set up a 
masters file with all JobManagers hosts and their web UI ports.
+    Flink leverages ZooKeeper for distributed coordination between all running 
JobManager instances. ZooKeeper is a separate service from Flink,
+    which provides highly reliable distributed coordination via leader 
election and light-weight consistent state storage[23].
+
+### [Apache Flume](https://flume.apache.org/)
+  - Flume is a distributed, reliable, and available service for efficiently 
collecting, aggregating, and moving large amounts
+    of log data. It has a simple and flexible architecture based on streaming 
data flows. It is robust and fault tolerant
+    with tunable reliability mechanisms and many failover and recovery 
mechanisms. It uses a simple extensible data model
+    that allows for online analytic application.
+  - Flume supports Agent configurations via Zookeeper. This is an experimental 
feature.[5]
+
+### [Apache Hadoop](http://hadoop.apache.org/)
+  - The Apache Hadoop software library is a framework that allows for the 
distributed processing of large data sets across
+    clusters of computers using simple programming models. It is designed to 
scale up from single servers to thousands of machines,
+    each offering local computation and storage. Rather than rely on hardware 
to deliver high-availability,
+    the library itself is designed to detect and handle failures at the 
application layer, so delivering a highly-available service on top of a cluster 
of computers, each of which may be prone to failures.
+  - The implementation of automatic HDFS failover relies on ZooKeeper for the 
following things:
+    - **Failure detection** - each of the NameNode machines in the cluster 
maintains a persistent session in ZooKeeper.
+      If the machine crashes, the ZooKeeper session will expire, notifying the 
other NameNode that a failover should be triggered.
+    - **Active NameNode election** - ZooKeeper provides a simple mechanism to 
exclusively elect a node as active. If the current active NameNode crashes,
+      another node may take a special exclusive lock in ZooKeeper indicating 
that it should become the next active.
+  - The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper 
client which also monitors and manages the state of the NameNode.
+    Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC 
is responsible for:
+    - **Health monitoring** - the ZKFC pings its local NameNode on a periodic 
basis with a health-check command.
+      So long as the NameNode responds in a timely fashion with a healthy 
status, the ZKFC considers the node healthy.
+      If the node has crashed, frozen, or otherwise entered an unhealthy 
state, the health monitor will mark it as unhealthy.
+    - **ZooKeeper session management** - when the local NameNode is healthy, 
the ZKFC holds a session open in ZooKeeper.
+      If the local NameNode is active, it also holds a special “lock” znode. 
This lock uses ZooKeeper’s support for “ephemeral” nodes;
+      if the session expires, the lock node will be automatically deleted.
+    - **ZooKeeper-based election** - if the local NameNode is healthy, and the 
ZKFC sees that no other node currently holds the lock znode,
+      it will itself try to acquire the lock. If it succeeds, then it has “won 
the election”, and is responsible for running a failover to make its local 
NameNode active.
+      The failover process is similar to the manual failover described above: 
first, the previous active is fenced if necessary,
+      and then the local NameNode transitions to active state.[7]
+
+### [Apache HBase](https://hbase.apache.org/)
+  - HBase is the Hadoop database. It's an open-source, distributed, 
column-oriented store model.
+  - HBase uses ZooKeeper for master election, server lease management, 
bootstrapping, and coordination between servers.
+    A distributed Apache HBase installation depends on a running ZooKeeper 
cluster. All participating nodes and clients
+    need to be able to access the running ZooKeeper ensemble.[8]
+  - As you can see, ZooKeeper is a fundamental part of HBase. All operations 
that require coordination, such as Regions
+    assignment, Master-Failover, replication, and snapshots, are built on 
ZooKeeper[20]. 
+
+### [Apache 
Hedwig](https://bookkeeper.apache.org/docs/r4.2.3/hedwigConsole.html)
 
 Review comment:
   This is dead

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to