svn commit: r1735299 [3/6] - in /storm/branches/bobby-versioned-site: _includes/ releases/0.9.6/ releases/0.9.6/images/

bobby Fri, 18 Mar 2016 22:36:15 -0700

Added: storm/branches/bobby-versioned-site/releases/0.9.6/Powered-By.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Powered-By.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Powered-By.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Powered-By.md Wed Mar 16 
21:18:57 2016
@@ -0,0 +1,1028 @@
+---
+layout: documentation
+---
+Want to be added to this page? Send an email 
[here](mailto:[email protected]).
+
+<table>
+
+<tr>
+<td>
+<a href="http://groupon.com";>Groupon</a>
+</td>
+<td>
+<p>
+At Groupon we use Storm to build real-time data integration systems. Storm 
helps us analyze, clean, normalize, and resolve large amounts of non-unique 
data points with low latency and high throughput.
+</p>
+</td>
+</tr>
+
+<tr>
+<td><a href="http://www.weather.com/";>The Weather Channel</a></td>
+<td>
+<p>At Weather Channel we use several Storm topologies to ingest and persist 
weather data. Each topology is responsible for fetching one dataset from an 
internal or external network (the Internet), reshaping the records for use by 
our company, and persisting the records to relational databases. It is 
particularly useful to have an automatic mechanism for repeating attempts to 
download and manipulate the data when there is a hiccup.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.fullcontact.com/";>FullContact</a>
+</td>
+<td>
+<p>
+At FullContact we currently use Storm as the backbone of the system which 
synchronizes our Cloud Address Book with third party services such as Google 
Contacts and Salesforce. We also use it to provide real-time support for our 
contact graph analysis and federated contact search systems.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://twitter.com";>Twitter</a>
+</td>
+<td>
+<p>
+Storm powers a wide variety of Twitter systems, ranging in applications from 
discovery, realtime analytics, personalization, search, revenue optimization, 
and many more. Storm integrates with the rest of Twitter's infrastructure, 
including database systems (Cassandra, Memcached, etc), the messaging 
infrastructure, Mesos, and the monitoring/alerting systems. Storm's isolation 
scheduler makes it easy to use the same cluster both for production 
applications and in-development applications, and it provides a sane way to do 
capacity planning.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.com";>Yahoo!</a>
+</td>
+<td>
+<p>
+Yahoo! is developing a next generation platform that enables the convergence 
of big-data and low-latency processing. While Hadoop is our primary technology 
for batch processing, Storm empowers stream/micro-batch processing of user 
events, content feeds, and application logs. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.co.jp/";>Yahoo! JAPAN</a>
+</td>
+<td>
+<p>
+Yahoo! JAPAN is a leading web portal in Japan. Storm applications are 
processing various streaming data such as logs or social data. We use Storm to 
feed contents, monitor systems, detect trending topics, and crawl on websites.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.webmd.com";>WebMD</a>
+</td>
+<td>
+<p>
+We use Storm to power our Medscape Medpulse mobile application which allow 
medical professionals to follow important medical trends with Medscape's 
curated Today on Twitter feed and selection of blogs. Storm topology is 
capturing and processing tweets with twitter streaming API, enhance tweets with 
metadata and images, do real time NLP and execute several business rules. Storm 
also monitors selection of blogs in order to give our customers real-time 
updates.  We also use Storm for internal data pipelines to do ETL and for our 
internal marketing platform where time and freshness are essential.
+</p>
+<p>
+We use storm to power our search indexing process.  We continue to discover 
new use cases for storm and it became one of the core component in our 
technology stack.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.spotify.com";>Spotify</a>
+</td>
+<td>
+<p>
+Spotify serves streaming music to over 10 million subscribers and 40 million 
active users. Storm powers a wide range of real-time features at Spotify, 
including music recommendation, monitoring, analytics, and ads targeting. 
Together with Kafka, memcached, Cassandra, and netty-zmtp based messaging, 
Storm enables us to build low-latency fault-tolerant distributed systems with 
ease.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.infochimps.com";>Infochimps</a>
+</td>
+<td>
+<p>
+Infochimps uses Storm as part of its Big Data Enterprise Cloud. Specifically, 
it uses Storm as the basis for one of three of its cloud data services - 
namely, Data Delivery Services (DDS), which uses Storm to provide a 
fault-tolerant and linearly scalable enterprise data collection, transport, and 
complex in-stream processing cloud service. 
+</p>
+
+<p>
+In much the same way that Hadoop provides batch ETL and large-scale batch 
analytical processing, the Data Delivery Service provides real-time ETL and 
large-scale real-time analytical processing â the perfect complement to 
Hadoop (or in some cases, what you needed instead of Hadoop).
+</p>
+
+<p>
+DDS uses both Storm and Kafka along with a host of additional technologies to 
provide an enterprise-class real-time stream processing solution with features 
including:
+</p>
+
+<ul>
+<li>
+Integration connections to any variety of data sources in a way that is robust 
yet as non-invasive
+</li>
+<li>
+Optimizations for highly scalable, reliable data import and distributed ETL 
(extract, transform, load), fulfilling data transport needs
+</li>
+<li>
+Developer tools for rapid development of decorators, which perform the 
real-time stream processing
+</li>
+<li>
+Guaranteed delivery framework and data failover snapshots to send processed 
data to analytics systems, databases, file systems, and applications with 
extreme reliability
+</li>
+<li>
+Rapid solution development and deployment, along with our expert Big Data 
methodology and best practices
+</li>
+</ul>
+
+<p>Infochimps has extensive experience in deploying its DDS to power 
large-scale clickstream web data flows, massive Twitter stream processes, 
Foursquare event processing, customer purchase data, product pricing data, and 
more.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://healthmarketscience.com/";>Health Market Science</a>
+</td>
+<td>
+<p>
+Health Market Science (HMS) provides data management as a service for the 
healthcare industry.  Storm is at the core of the HMS big data platform 
functioning as the data ingestion mechanism, which orchestrates the data flow 
across multiple persistence mechanisms that allow HMS to deliver Master Data 
Management (MDM) and analytics capabilities for wide range of healthcare needs: 
compliance, integrity, data quality, and operational decision support.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://cerner.com/";>Cerner</a>
+</td>
+<td>
+<p>
+Cerner is a leader in health care information technology. We have been using 
Storm since its release to process massive amounts of clinical data in 
real-time. Storm integrates well in our architecture, allowing us to quickly 
provide clinicians with the data they need to make medical decisions.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.aeris.com/";>Aeris Communications</a>
+</td>
+<td>
+<p>
+Aeris Communications has the only cellular network that was designed and built 
exclusively for machines. Our ability to provide scalable, reliable real-time 
analytics - powered by Storm - for machine to machine (M2M) communication 
offers immense value to our customers. We are using Storm in production since 
Q1 of 2013.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://flipboard.com/";>Flipboard</a>
+</td>
+<td>
+<p>
+Flipboard is the worldÊ¼s ï¬rst social magazine, a single place to keep up 
with everything  you care about and collect it in ways that let reï¬ect you. 
Inspired by the beauty and  ease of print media, Flipboard is designed so you 
can easily ï¬ip through news from around the world or stories from right at 
home, helping people ï¬nd the one thing that  can inform, entertain or even 
inspire them every day.
+</p>
+<p>
+We are using Storm across a wide range of our services from content search, to 
realtime analytics, to generating custom magazine feeds. We then integrate 
Storm across our infrastructure within systems like ElasticSearch, HBase, 
Hadoop and HDFS to create a highly scalable data platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.rubiconproject.com/";>Rubicon Project</a>
+</td>
+<td>
+<p>
+Storm is being used in production mode at the Rubicon Project to analyze the 
results of auctions of ad impressions on its RTB exchange as they occur.  It is 
currently processing around 650 million auction results in three data centers 
daily (with 3 separate Storm clusters). One simple application is identifying 
new creatives (ads) in real time for ad quality purposes.  A more sophisticated 
application is an "Inventory Valuation Service" that uses DRPC to return 
appraisals of new impressions before the auction takes place.  The appraisals 
are used for various optimization problems, such as deciding whether to auction 
an impression or skip it when close to maximum capacity.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ooyala.com/";>Ooyala</a>
+</td>
+<td>
+<p>
+Ooyala powers personalized multi-screen video experiences for some of the 
world's largest networks, brands and media companies. We provide all the 
technology and tools our customers need to manage, distribute and monetize 
digital video content at a global scale.
+</p>
+
+<p>
+At the core of our technology is an analytics engine that processes over two 
billion analytics events each day, derived from nearly 200 million viewers 
worldwide who watch video on an Ooyala-powered player.
+</p>
+
+<p>
+Ooyala will be deploying Storm in production to give our customers real-time 
streaming analytics on consumer viewing behavior and digital content trends. 
Storm enables us to rapidly mine one of the world's largest online video data 
sets to deliver up-to-the-minute business intelligence ranging from real-time 
viewing patterns to personalized content recommendations to dynamic programming 
guides and dozens of other insights for maximizing revenue with online video.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.taobao.com/index_global.php";>Taobao</a>
+</td>
+<td>
+<p>
+We make statistics of logs and extract useful information from the statistics 
in almost real-time with Storm.  Logs are read from Kafka-like persistent 
message queues into spouts, then processed and emitted over the topologies to 
compute desired results, which are then stored into distributed databases to be 
used elsewhere. Input log count varies from 2 millions to 1.5 billion every 
day, whose size is up to 2 terabytes among the projects.  The main challenge 
here is not only real-time processing of big data set; storing and persisting 
result is also a challenge and needs careful design and implementation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.alibaba.com/";>Alibaba</a>
+</td>
+<td>
+<p>
+Alibaba is the leading B2B e-commerce website in the world. We use storm to 
process the application log and the data change in database to supply realtime 
stats for data apps.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://iQIYI.COM";>iQIYI</a>
+</td>
+<td>
+<p>
+iQIYI is China`s largest online video platform. We are using Storm in our 
video advertising system, video recommendation system, log analysis system and 
many other scenarios. Now we have several standalone Storm clusters, and we 
also have Storm clusters on Mesos and on Yarn. Kafka-Storm integration and 
StormâHBase integration are quite common in our production environment. We 
have great interests in the new development about integration of Storm with 
other applications, like HBase, HDFS and Kafka.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.baidu.com/";>Baidu</a>
+</td>
+<td>
+<p>
+Baidu offers top searching technology services for websites, audio files and 
images, my group using Storm to process the searching logs to supply realtime 
stats for accounting pv, ar-time and so on.
+This project helps Ops to determine and monitor services status and can do 
great things in the future.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yelp.com/";>Yelp</a>
+</td>
+<td>
+<p>
+Yelp is using Storm with <a href="http://pyleus.org/";>Pyleus</a> to build a 
platform for developers to consume and process high throughput streams of data 
in real time. We have ongoing projects to use Storm and Pyleus for overhauling 
our internal application metrics pipeline, building an automated Python profile 
analysis system, and for general ETL operations. As its support for non-JVM 
components matures, we hope to make Storm the standard way of processing 
streaming data at Yelp.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.klout.com/";>Klout</a>
+</td>
+<td>
+<p>
+Klout helps everyone discover and be recognized for their influence by 
analyzing engagement with their content across social networks. Our analysis 
powers a daily Klout Score on a scale from 1-100 that shows how much influence 
social media users have and on what topics. We are using Storm to develop a 
realtime scoring and moments generation pipeline. Leveraging Storm's intuitive 
Trident abstraction we are able to create complex topologies which stream data 
from our network collectors via Kafka, processed and written out to HDFS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.loggly.com";>Loggly</a>
+</td>
+<td>
+<p>
+Loggly is the world's most popular cloud-based log management. Our cloud-based 
log management service helps DevOps and technical teams make sense of the the 
massive quantity of logs that are being produced by a growing number of 
cloud-centric applications â in order to solve operational problems faster. 
Storm is the heart of our ingestion pipeline where it filters, parses and 
analyses billions of log events all-day, every day and in real-time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://premise.is/";>premise.is</a>
+</td>
+<td>
+<p>
+We're building a platform for alternative, bottom-up, high-granularity 
econometric data capture, particularly targeting opaque developing economies 
(i.e., Argentina might lie about their inflation statistics, but their black 
market certainly doesn't). Basically we get to funnel hedge fund money into 
improving global economic transparency. 
+</p>
+<p>
+We've been using Storm in production since January 2012 as a streaming, 
time-indexed web crawl + extraction + machine learning-based semantic markup 
flow (about 60 physical nodes comparable to m1.large; generating a modest 
25GB/hr incremental). We wanted to have an end-to-end push-based system where 
new inputs get percolated through the topology in realtime and appear on the 
website, with no batch jobs required in between steps. Storm has been really 
integral to realizing this goal.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://www.wego.com/";>Wego</a>
+</td>
+<td>
+<p>About Wego, we are one of the worldâs most comprehensive travel 
metasearch engines, operating in 42 markets worldwide and used by millions of 
travelers to save time, pay less and travel more. We compare and display 
real-time flights, hotel pricing and availability from hundreds of leading 
travel sites from all around the world on one simple screen.</p>
+
+<p>At the heart of our products, Storm helps us to stream real-time 
meta-search data from our partners to end-users. Since data comes from many 
sources and with different timing, Storm topology concept naturally solves 
concurrency issues while helping us to continuously merge, slice and clean all 
the data. Additionally with a few tricks and tools provided in Storm we can 
easily apply incremental update to improve the flow our data (1-5GB/minute).</p>
+ 
+<p>With its simplicity, scalability, and flexibility, Storm does not only 
improve our current products but more importantly changes the way we work with 
data. Instead of keeping data static and crunching it once a while, we 
constantly move data all around, making use of different technologies, 
evaluating new ideas and building new products. We stream critical data to 
memory for fast access while continuously crunching and directing huge amount 
of data into various engines so that we can evaluate and make use of data 
instantly. Previously, this kind of system requires to setup and maintain quite 
a few things but with Storm all we need is half day of coding and a few seconds 
to deploy. In this sense we never think Storm is to serve our products but 
rather to evolve our products.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://rocketfuel.com/";>RocketFuel</a>
+</td>
+<td>
+<p>
+At Rocket Fuel (an ad network) we are building a real time platform on top of 
Storm which imitates the time critical workflows of existing Hadoop based ETL 
pipeline. This platform tracks impressions, clicks, conversions, bid requests 
etc. in real time. We are using Kafka as message queue. To start with we are 
pushing per minute aggregations directly to MySQL, but we plan to go finer than 
one minute and may bring HBase in to the picture to handle increased write 
load. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://quicklizard.com/";>QuickLizard</a>
+</td>
+<td>
+<p>
+QuickLizard builds solution for automated pricing for companies that have many 
products in their lists. Prices are influenced by multiple factors internal and 
external to company.
+</p>
+
+<p>
+Currently we use Storm to choose products that need to be priced. We get real 
time stream of events from client site and filters them to get much more light 
stream of products that need to be processed by our procedures to get price 
recommendation.
+</p>
+
+<p>
+In plans: use Storm also for real time data mining model calculation that 
should match products described on competitor sites to client products.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://spider.io/";>spider.io</a>
+</td>
+<td>
+<p>
+At spider.io we've been using Storm as a core part of our classification 
engine since October 2011. We run Storm topologies to combine, analyse and 
classify real-time streams of internet traffic, to identify suspicious or 
undesirable website activity. Over the past 7 months we've expanded our use of 
Storm, so it now manages most of our real-time processing. Our classifications 
are displayed in a custom analytics dashboard, where Storm's distributed remote 
procedure call interface is used to gather data from our database and metadata 
services. DRPC allows us to increase the responsiveness of our user interface 
by distributing processing across a cluster of Amazon EC2 instances.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://8digits.com/";>8digits</a>
+</td>
+<td>
+<p>
+At 8digits, we are using Storm in our analytics engine, which is one of the 
most crucial parts of our infrastructure. We are utilizing several cloud 
servers with multiple cores each for the purpose of running a real-time system 
making several complex calculations. Storm is a proven, solid and a powerful 
framework for most of the big-data problems.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="https://www.alipay.com/";>Alipay</a>
+</td>
+<td>
+<p>
+Alipay is China's leading third-party online payment platform. We are using 
Storm in many scenarios:
+</p>
+
+<ol>
+<li>
+Calculate realtime trade quantity, trade amount, the TOP N seller trading 
information, user register count. More than 100 million messages per day.
+</li>
+<li>
+Log processing, more than 6T data per day.
+</li>
+</ol>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://navisite.com/";>NaviSite</a>
+</td>
+<td>
+<p>
+We are using Storm as part of our server event log monitoring/auditing system. 
 We send log messages from thousands of servers into a RabbitMQ cluster and 
then use Storm to check each message against a set of regular expressions.  If 
there is a match (&lt; 1% of messages), then the message is sent to a bolt that 
stores data in a Mongo database.  Right now we are handling a load of somewhere 
around 5-10k messages per second, however we tested our existing RabbitMQ + 
Storm clusters up to about 50k per second.  We have plans to do real time 
intrusion detection as an enhancement to the current log message reporting 
system. 
+</p>
+
+<p>
+We have Storm deployed on the NaviSite Cloud platform.  We have a ZK cluster 
of 3 small VMs, 1 Nimbus VM and 16 dual core/4GB VMs as supervisors.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.paywithglyph.com";>Glyph</a>
+</td>
+<td>
+<p>
+Glyph is in the business of providing credit card rewards intelligence to 
consumers. At a given point of sale Glyph suggest its users what are the best 
cards to be used at a given merchant location that will provide maximum 
rewards. Glyph also provide suggestion on the cards the user should carry to 
earn maximum rewards based on his personal spending habits. Glyph provides this 
information to the user by retrieving and analyzing credit card transactions 
from banks. Storm is used in Glyph to perform this retrieval and analysis in 
realtime. We are using Memcached in conjuction with Storm for handling 
sessions. We are impressed by how Storm makes high availability and reliability 
of Glyph services possible. We are now using Storm and Clojure in building 
Glyph data analytics and insights services. We have open-sourced node-drpc 
wrapper module for easy Storm DRPC integration with NodeJS.
+</p>
+</td>
+</tr>
+<tr>
+<td>
+<a href="http://heartbyte.com/";>Heartbyte</a>
+</td>
+<td>
+<p>
+At Heartbyte, Storm is a central piece of our realtime audience participation 
platform.  We are often required to process a 'vote' per second from hundreds 
of thousands of mobile devices simultaneously and process / aggregate all of 
the data within a second.  Further, we are finding that Storm is a great 
alternative to other ingest tools for Hadoop/HBase, which we use for batch 
processing after our events conclude.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://2lemetry.com/";>2lemetry</a>
+</td>
+<td>
+<p>
+2lemetry uses Storm to power it's real time analytics on top of the m2m.io 
offering. 2lemetry is partnered with Sprint, Verizon, AT&T, and Arrow 
Electronics to power IoT applications world wide. Some of 2lemetry's larger 
projects include RTX, Kontron, and Intel. 2lemetry also works with many 
professional sporting teams to parse data in real time. 2lemetry receives 
events for every touch of the ball in every MLS soccer match. Storm is used to 
look for trends like passing tendencies as they develop during the game. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.nodeable.com/";>Nodeable</a>
+</td>
+<td>
+<p>
+Nodeable uses Storm to deliver real-time continuous computation of the data we 
consume. Storm has made it significantly easier for us to scale our service 
more efficiently while ensuring the data we deliver is timely and accurate.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://twitsprout.com/";>TwitSprout</a>
+</td>
+<td>
+<p>
+At TwitSprout, we use Storm to analyze activity on Twitter to monitor mentions 
of keywords (mostly client product and brand names) and trigger alerts when 
activity around a certain keyword spikes above normal levels. We also use Storm 
to back the data behind the live-infographics we produce for events sponsored 
by our clients. The infographics are usually in the form of a live dashboard 
that helps measure the audience buzz across social media as it relates to the 
event in realtime.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.happyelements.com/";>HappyElements</a>
+</td>
+<td>
+<p>
+<a href="http://www.happyelements.com";>HappyElements</a> is a leading social 
game developer on Facebook and other SNS platforms. We developed a real time 
data analysis program based on storm to analyze user activity in real time.  
Storm is very easy to use, stable, scalable and maintainable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.idexx.com/view/xhtml/en_us/corporate/home.jsf";>IDEXX 
Laboratories</a>
+</td>
+<td>
+<p>
+IDEXX Laboratories is the leading maker of software and diagnostic instruments 
for the veterinary market. We collect and analyze veterinary medical data from 
thousands of veterinary clinics across the US. We recently embarked on a 
project to upgrade our aging data processing infrastructure that was unable to 
keep up with the rapid increase in the volume, velocity and variety of data 
that we were processing.
+</p>
+
+<p>
+We are utilizing the Storm system to take in the data that is extracted from 
the medical records in a number of different schemas, transform it into a 
standard schema that we created and store it in an Oracle RDBMS database. It is 
basically a souped up distributed ETL system. Storm takes on the plumbing 
necessary for a distributed system and is very easy to write code for. The 
ability to create small pieces of functionality and connect them together gives 
us the ultimate flexibility to parallelize each of the pieces differently.
+</p>
+
+<p>
+Our current cluster consists of four supervisor machines running 110 tasks 
inside 32 worker processes. We run two different topologies which receive 
messages and communicate with each other via RabbitMQ. The whole thing is 
deployed on Amazon Web Services and utilizes S3 for some intermediate storage, 
Redis as a key/value store and Oracle RDS for RDBMS storage. The bolts are all 
written in Java using the Spring framework with Hibernate as an ORM.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.umeng.com/";>Umeng</a>
+</td>
+<td>
+Umeng is the leading and largest provider of mobile app analytics and 
developer services platform in China. Storm powers Umeng's realtime analytics 
platform, processing billions of data points per day and growing. We also use 
Storm in other products which requires realtime processing and it has become 
the core infrastructure in our company. 
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.admaster.com.cn/";>Admaster</a>
+</td>
+<td>
+<p>
+We provide monitoring and precise delivery for Internet advertising. We use 
Storm to do the following:
+</p>
+
+<ol>
+<li>Calculate PV, UV of every advertisement.</li>
+<li>Simple data cleaning: filter out data which format error, filter out 
cheating data (the pv more than certain value)</li>
+</ol>
+Our cluster has 8 nodes, process several billions messages per day, about 
200GB.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://socialmetrix.com/en/";>SocialMetrix</a>
+</td>
+<td>
+<p>
+Since its release, Storm was a perfect fit to our needs of real time 
monitoring. Its powerful API, easy administration and deploy, enabled us to 
rapidly build solutions to monitor presidential elections, several major events 
and currently it is the processing core of our new product "Socialmetrix 
Eventia".
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://needium.com/";>Needium</a>
+</td>
+<td>
+<p>
+At Needium we love Ruby and JRuby. The Storm platform offers the right balance 
between simplicity, flexibility and scalability. We created RedStorm, a Ruby 
DSL for Storm, to keep on using Ruby on top of the power of Storm by leveraging 
Storm's JVM foundation with JRuby. We currently use Storm as our Twitter 
realtime data processing pipeline. We have Storm topologies for content 
filtering, geolocalisation and classification. Storm allows us to architecture 
our pipeline for the Twitter full firehose scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://parse.ly/";>Parse.ly</a>
+</td>
+<td>
+<p>
+Parse.ly is using Storm for its web/content analytics system. We have a 
home-grown data processing and storage system built with Python and Celery, 
with backend stores in Redis and MongoDB. We are now using Storm for real-time 
unique visitor counting and are exploring options for using it for some of our 
richer data sources such as social share data and semantic content metadata.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.parc.com/";>PARC</a>
+</td>
+<td>
+<p>
+High Performance Graph Analytics & Real-time Insights Research team at PARC 
uses Storm as one of the building blocks of their PARC Analytics Cloud 
infrastructure which comprises of Nebula based Openstack, Hadoop, SAP HANA, 
Storm, PARC Graph Analytics, and machine learning toolbox to enable researchers 
to process real-time data feeds from Sensors, web, network, social media, and 
security traces and easily ingest any other real-time data feeds of interest 
for PARC researchers.
+</p>
+<p>
+PARC researchers are working with number of industry collaborators developing 
new tools, algorithms, and models to analyze massive amounts of e-commerce, web 
clickstreams, 3rd party syndicated data, cohort data, social media data 
streams, and structured data from RDBMS, NOSQL, and NEWSQL systems in near 
real-time. PARC  team is developing a reference architecture and benchmarks for 
their near real-time automated insight discovery platform combining the power 
of all above tools and PARCâs applied research in machine learning, graph 
analytics, reasoning, clustering, and contextual recommendations. The High 
Performance Graph Analytics & Real-time Insights research at PARC is headed by 
Surendra Reddy<http://www.linkedin.com/in/skreddy>.  If you are interested to 
learn more about our use/experience of using Storm or to know more about our 
research or to collaborate with PARC in this area, please feel free to contact 
[email protected].
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://gumgum.com/";>GumGum</a>
+</td>
+<td>
+<p>
+GumGum, the leading in-image advertising platform for publishers and brands, 
uses Storm to produce real-time data. Storm and Trident-based topologies 
consume various ad-related events from Kafka and persist the aggregations in 
MySQL and HBase. This architecture will eventually replace most existing daily 
Hadoop map reduce jobs. There are also plans for Kafka + Storm to replace 
existing distributed queue processing infrastructure built with Amazon SQS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.crowdflower.com/";>CrowdFlower</a>
+</td>
+<td>
+<p>
+CrowdFlower is using Storm with Kafka to generalize our data stream
+aggregation and realtime computation infrastructure. We replaced our
+homegrown aggregation solutions with Storm because it simplified the
+creation of fault tolerant systems. We were already using Zookeeper
+and Kafka, so Storm allowed us to build more generic abstractions for
+our analytics using tools that we had already deployed and
+battle-tested in production.
+</p>
+
+<p>
+We are currently writing to DynamoDB from Storm, so we are able to
+scale our capacity quickly by bringing up additional supervisors and
+tweaking the throughput on our Dynamo tables. We look forward to
+exploring other uses for Storm in our system, especially with the
+recent release of Trident.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.dsbox.com";>Digital Sandbox</a>
+</td>
+<td>
+<p>
+At Digital Sandbox we use Storm to enable our open source information feed 
monitoring system.  The system uses Storm to constantly monitor and pull data 
from structured and unstructured information sources across the internet.  For 
each found item, our topology applies natural language processing based concept 
analysis, temporal analysis, geospatial analytics and a prioritization 
algorithm to enable users to monitor large special events, public safety 
operations, and topics of interest to a multitude of individual users and teams.
+</p>
+ 
+<p>
+Our system is built using Storm for feed retrieval and annotation, Python with 
Flask and jQuery for business logic and web interfaces, and MongoDB for data 
persistence. We use NTLK for natural language processing and the WordNet, 
GeoNames, and OpenStreetMap databases to enable feed item concept extraction 
and geolocation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://hallo.me/";>Hallo</a>
+</td>
+<td>
+With several mainstream celebrities and very popular YouTubers using Hallo to 
communicate with their fans, we needed a good solution to notify users via push 
notifications and make sure that the celebrity messages were delivered to 
follower timelines in near realtime. Our initial approach for broadcast push 
notifications would take anywhere from 2-3 hours. After re-engineering our 
solution on top of Storm, that time has been cut down to 5 minutes on a very 
small cluster. With the user base growing and user need for realtime 
communication, we are very happy knowing that we can easily scale Storm by 
adding nodes to maintain a baseline QoS for our users.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://keepcon.com/";>Keepcon</a>
+</td>
+<td>
+We provide moderation services for classifieds, kids communities, newspapers, 
chat rooms, facebook fan pages, youtube channels, reviews, and all kind of UGC. 
We use storm for the integration with our clients, find evidences within each 
text, persisting on cassandra and elastic search and sending results back to 
our clients.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.visiblemeasures.com/";>Visible Measures</a>
+</td>
+<td>
+<p>
+Visible Measures powers video campaigns and analytics for publishers and
+advertisers, tracking data for hundreds of million of videos, and billions
+of views. We are using Storm to process viewing behavior data in real time and 
make
+the information immediately available to our customers. We read events from
+various push and pull sources, including a Kestrel queue, filter and
+enrich the events in Storm topologies, and persist the events to Redis,
+HDFS and Vertica for real-time analytics and archiving. We are currently
+experimenting with Trident topologies, and figuring out how to move more
+of our Hadoop-based batch processing into Storm.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.o2mc.eu/en/";>O2mc</a>
+</td>
+<td>
+<p>
+One of the core products of O2mc is called O2mc Community. O2mc Community 
performs multilingual, realtime sentiment analysis with very low latency and 
distributes the analyzed results to numerous clients. The input is extracted 
from source systems like Twitter, Facebook, e-mail and many more. After the 
analysis has taken place on Storm, the results are streamed to any output 
system ranging from HTTP streaming to clients to direct database insertion to 
an external business process engine to kickstart a process.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.theladders.com";>The Ladders</a>
+</td>
+<td>
+<p>
+TheLadders has been committed to finding the right person for the right job 
since 2003. We're using Storm in a variety of ways and are happy with its 
versatility, robustness, and ease of development. We use Storm in conjunction 
with RabbitMQ for such things as sending hiring alerts: when a recruiter 
submits a job to our site, Storm processes that event and will aggregate 
jobseekers whose profiles match the position. That list is subsequently 
batch-processed to send an email to the list of jobseekers. We also use Storm 
to persist events for Business Intelligence and internal event tracking. We're 
continuing to find uses for Storm where fast, asynchronous, real-time event 
processing is a must.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://semlab.nl";>SemLab</a>
+</td>
+<td>
+<p>
+SemLab develops software for knowledge discovery and information support. Our 
ViewerPro platform uses information extraction, natural language processing and 
semantic web technologies to extract structured data from unstructured sources, 
in domains such as financial news feeds and legal documents. We have 
succesfully adapted ViewerPro's processing framework to run on top of Storm. 
The transition to Storm has made ViewerPro a much more scalable product, 
allowing us to process more in less time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://visualrevenue.com/";>Visual Revenue</a>
+</td>
+<td>
+<p>
+Here at Visual Revenue, we built a decision support system to help online 
editors to make choices on what, when, and where to promote their content in 
real-time. Storm is the backbone our real-time data processing and aggregation 
pipelines.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.peerindex.com/";>PeerIndex</a>
+</td>
+<td>
+<p>
+PeerIndex is working to deliver influence at scale. PeerIndex does this by 
exposing services built on top of our Influence Graph; a directed graph of who 
is influencing whom on the web. PeerIndex gathers data from a number of social 
networks to create the Influence Graph. We use Storm to process our social 
data, to provide real-time aggregations, and to crawl the web, before storing 
our data in a manner most suitable for our Hadoop based systems to batch 
process. Storm provided us with an intuitive API and has slotted in well with 
the rest of our architecture. PeerIndex looks forward to further investing 
resources into our Storm based real-time analytics.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://ants.vn";>ANTS.VN</a>
+</td>
+<td>
+<p>
+Big Data in Advertising is Vietnam's unique platform combines ad serving, a 
real-time bidding (RTB) exchange, Ad Server, Analytics, yield optimization, and 
content valuation to deliver the highest revenue across every desktop, tablet, 
and mobile screen. At ANTS.VN we use Storm to process large amounts of data to 
provide data real time, improve our Ad quality. This platform tracks 
impressions, clicks, conversions, bid requests etc. in real time. Together with 
Kafka, Redis, memcached and Cassandra based messaging, Storm enables us to 
build low-latency fault-tolerant distributed systems with ease.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wayfair.com";>Wayfair</a>
+</td>
+<td>
+<p>
+At Wayfair, we use storm as a platform to drive our core order processing 
pipeline as an event driven system. Storm allows us to reliably process tens of 
thousands of orders daily while providing us the assurance of seamless process 
scalability as our order load increases. Given the projectâs ease of use and 
the immense support of the community, weâve managed to implement our bolts in 
php, construct a simple puppet module for configuration management, and quickly 
solve arising issues. We can now focus most of our development efforts in the 
business layer, check out more information on how we use storm <a 
href="http://engineering.wayfair.com/stormin-oms/";>in our engineering blog</a>. 
</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://innoquant.com/";>InnoQuant</a>
+</td>
+<td>
+<p>
+At InnoQuant, we use Storm as a backbone of our real-time big data analytics 
engine in MOCA platform. MOCA is a next generation, mobile-backend-as-a-service 
platform (MBaaS). It provides brands and app developers with real-time in-app 
tracking, context-aware push messaging, user micro-segmentation based on 
profile, time and geo-context as well as big data analytics. Storm-based 
pipeline is fed with events captured by native mobile SDKs (iOS, Android), 
scales nicely with connected mobile app users, delivers stream-based metrics 
and aggregations, and finally integrates with the rest of MOCA infrastructure, 
including columnar storage (Cassandra) and graph storage (Titan).
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.fliptop.com/";>Fliptop</a>
+</td>
+<td>
+<p>
+Fliptop is a customer intelligence platform which allows customers to 
integrating their contacts, and campaign data, to enhance their prospect with 
social identities, and to find their best leads, and most influential 
customers. We have been using Storm for various tasks which requires 
scalability and reliability, including integrating with sales/marketing 
platform, data appending for contacts/leads, and computing scoring of 
contacts/leads. It's one of our most robust and scalable infrastructure.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.trovit.com/";>Trovit</a>
+</td>
+<td>
+<p>
+Trovit is a search engine for classified ads present in 39 countries and 
different business categories (Real Estate, Cars, Jobs, Rentals, Products and 
Deals). Currently we use Storm to process and index ads in a distributed and 
low latency fashion. Combined with other technologies like Hadoop, Hbase and 
Solr has allowed us to build a scalable and low latency platform to serve 
search results to the end user.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.openx.com/";>OpenX</a>
+</td>
+<td>
+<p>
+OpenX is a unique platform combines ad serving, a real-time bidding (RTB) 
exchange, yield optimization, and content valuation to deliver the highest 
revenue across every desktop, tablet, and mobile screen
+At OpenX we use Storm to process large amounts of data to provide real time 
Analytics. Storm provides us to process data real time to improve our Ad 
quality.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://keen.io/";>Keen IO</a>
+</td>
+<td>
+<p>
+Keen IO is an analytics backend-as-a-service. The Keen IO API makes it easy 
for customers to do internal analytics or expose analytics features to their 
customers. Keen IO uses Storm (DRPC) to query billion-event data sets at very 
low latencies. We also use Storm to control our ingestion pipeline, sourcing 
data from Kafka and storing it in Cassandra.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://liveperson.com/";>LivePerson</a>
+</td>
+<td>
+<p>
+LivePerson is a provider of Interaction-Service over the web. Interaction 
between an agent and a visitor in site can be achieved using phone call, chat, 
banners, etc.Using Storm, LivePerson can collect and process visitor data and 
provide information in real time to the agents about the visitor behavior. 
Moreover, LivePerson gets to better decisions about how to react to visitors in 
a way that best addresses their needs.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://yieldbot.com/";>YieldBot</a>
+</td>
+<td>
+<p>
+Yieldbot connects ads to the real-time consumer intent streaming within 
premium publishers. To do this, Yieldbot leverages Storm for a wide variety of 
real-time processing tasks. We've open sourced our clojure DSL for writing 
trident topologies, marceline, which we use extensively. Events are read from 
Kafka, most state is stored in Cassandra, and we heavily use Storm's DRPC 
features. Our Storm use cases range from HTML processing, to hotness-style 
trending, to probabilistic rankings and cardinalities. Storm topologies touch 
virtually all of the events generated by the Yieldbot platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://equinix.com/";>Equinix</a>
+</td>
+<td>
+<p>
+At Equinix, we use a number of Storm topologies to process and persist various 
data streams generated by sensors in our data centers. We also use Storm for 
real-time monitoring of different infrastructure components. Other few 
topologies are used for processing logs in real-time for internal IT systems  
which also provide insights in user behavior.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://minewhat.com/";>MineWhat</a>
+</td>
+<td>
+<p>
+MineWhat provides actionable analytics for ecommerce spanning every SKU,brand 
and category in the store. We use Storm to process raw click stream ingestion 
from Kafka and compute live analytics. Storm topologies powers our complex 
product to user interaction analysis. Multi language feature in storm is really 
kick-ass, we have bolts written in Node.js, Python and Ruby. Storm has been in 
our production site since Nov 2012.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.360.cn/";>Qihoo 360</a>
+</td>
+<td>
+<p>
+360 have deployed about 50 realtime applications on top of storm including web 
page analysis, log processing, image processing, voice processing, etc.
+</p>
+<p>
+The use case of storm at 360 is a bit special since we deployed storm on 
thounds of servers which are not dedicated for storm. Storm just use little 
cpu/memory/network resource on each server. However theses storm clusters 
leverage idle resources of servers at nearly zero cost to provide great 
computing power and it's realtime. It's amazing.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.holidaycheck.com/";>HolidayCheck</a>
+</td>
+<td>
+<p>
+HolidayCheck is an online travel site and agency available in 10
+languages worldwide visited by 30 million people a month.
+We use Storm to deliver real-time hotel and holiday package offers
+from multiple providers - reservation systems and affiliate travel
+networks - in a low latency fashion based on user-selected criteria.
+In further reservation steps we use DRPC for vacancy checks and
+bookings of chosen offers. Along with Storm in the system for offers
+delivery we use Scala, Akka, Hazelcast, Drools and MongoDB. Real-time
+offer stream is delivered outside of the system back to the front-end
+via websocket connections.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://dataminelab.com/";>DataMine Lab</a>
+</td>
+<td>
+<p>
+DataMine Lab is a consulting company integrating Storm into its
+portfolio of technologies. Storm powers range of our customers'
+systems allowing us to build real time analytics on tens of millions
+of visitors to the advertising platforms we helped to create. Together
+with Redis, Cassandra and Hadoop, Storm allows us to provide real-time
+distributed data platform at a global scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wizecommerce.com/";>Wize Commerce</a>
+</td>
+<td>
+<p>
+Wize CommerceÂ® is the smartest way to grow your digital business. For over 
ten years, we have been helping clients maximize their revenue and traffic 
using optimization technologies that operate at massive scale, and across 
digital ecosystems. We own and operate leading comparison shopping engines 
including NextagÂ®, PriceMachineTM, and <a 
href="http://guenstiger.de";>guenstiger.de</a>, and provide services to a wide 
ecosystem of partner sites that use our e-commerce platform. These sites 
together drive over $1B in annual merchant sales.
+</p>
+<p>
+We use storm to power our core platform infrastructure and it has become a 
vital component of our search indexing system & Cassandra storage. Along with 
KAFKA, STORM has reduced our end-to-end latencies from several hours to few 
minutes, and being largest comparison shopping sites operator, pushing price 
updates to the live site is very important and storm helps a lot achieve the 
same. We are extensively using storm in production since Q1 2013.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://metamarkets.com";>Metamarkets</a>
+</td>
+<td>
+<p>At Metamarkets, Apache Storm is used to process real-time event data 
streamed from Apache Kafka message brokers, and then to load that data into a 
<a href="http://druid.io";>Druid cluster</a>, the low-latency data store at the 
heart of our real-time analytics service. Our Storm topologies perform various 
operations, ranging from simple filtering of "outdated" events, to 
transformations such as ID-to-name lookups, to complex multi-stream joins. 
Since our service is intended to respond to ad-hoc queries within seconds of 
ingesting events, the speed, flexibility, and robustness of those topologies 
make Storm a key piece of our real-time stack.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mightytravels.com";>Mighty Travels</a>
+</td>
+<td>
+<p>We are using Storm to process real-time search data stream and
+application logs. The part we like best about Storm is the ease of
+scaling up basically just by throwing more machines at it.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.polecat.co";>Polecat</a>
+</td>
+<td>
+<p>Polecat's digital analyisis platform, MeaningMine, allows users to search 
all on-line news, blogs and social media in real-time and run bespoke analysis 
in order to inform corporate strategy and decision making for some of the world 
largest companies and governmental organisations.</p>
+<p>
+Polecat uses Storm to run an application we've called the 'Data Munger'.  We 
run many different topologies on a multi host storm cluster to process tens of 
millions of online articles and posts that we collect each day.  Storm handles 
our analysis of these documents so that we can provide insight on realtime data 
to our clients.  We output our results from Storm into one of many large Apache 
Solr clusters for our end user applications to query (Polecat is also a 
contributor to Solr).  We first starting developing our app to run on storm 
back in June 2012 and it has been live since roughly September 2012.  We've 
found Storm to be an excellent fit for our needs here, and we've always found 
it extremely robust and fast.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.skylight.io/";>Skylight by Tilde</a>
+</td>
+<td>
+<p>Skylight is a production profiler for Ruby on Rails apps that focuses on 
providing detailed information about your running application that you can 
explore in an intuitive way. We use Storm to process traces from our agent into 
data structures that we can slice and dice for you in our web app.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ad4game.com/";>Ad4Game</a>
+</td>
+<td>
+<p>We are an advertising network and we use Storm to calculate priorities in 
real time to know which ads to show for which website, visitor and country.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.impetus.com/";>Impetus Technologies</a>
+</td>
+<td>
+<p>StreamAnalytix, a product of Impetus Technologies enables enterprises to 
analyze and respond to events in real-time at Big Data scale. Based on Apache 
Storm, StreamAnalytix is designed to rapidly build and deploy streaming 
analytics applications for any industry vertical, any data format, and any use 
case. This high-performance scalable platform comes with a pre-integrated 
package of components like Cassandra, Storm, Kafka and more. In addition, it 
also brings together the proven open source technology stack with Hadoop and 
NoSQL to provide massive scalability, dynamic data pipelines, and a visual 
designer for rapid application development.</p>
+<p>
+Through StreamAnalytix, the users can ingest, store and analyze millions of 
events per second and discover exceptions, patterns, and trends through live 
dashboards. It also provides seamless integration with indexing store 
(ElasticSearch) and NoSQL database (HBase, Cassandra, and Oracle NoSQL) for 
writing data in real-time. With the use of Storm, the product delivers high 
business value solutions such as log analytics, streaming ETL, deep social 
listening, Real-time marketing, business process acceleration and predictive 
maintenance.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.akazoo.com/en";>Akazoo</a>
+</td>
+<td>
+<p>
+Akazoo is a platform providing music streaming services.  Storm is the 
backbone of all our real-time analytical processing. We use it for tracking and 
analyzing application events and for various other stuff, including 
recommendations and parallel task execution.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mapillary.com";>Mapillary</a>
+</td>
+<td>
+<p>
+At Mapillary we use storm for a wide variety of tasks. Having a system which 
is 100% based on kafka input storm and trident makes reasoning about our data a 
breeze.  
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.gutscheinrausch.de/";>Gutscheinrausch.de</a>
+</td>
+<td>
+<p>
+We recently upgraded our existing IT infrastructure, using Storm as one of our 
main tools.
+Each day we collect sales, clicks, visits and various ecommerce metrics from 
various different systems (webpages, affiliate reportings, networks, 
tracking-scripts etc). We process this continually generated data using Storm 
before entering it into the backend systems for further use.
+</p>
+<p>
+Using Storm we were able to decouple our heterogeneous frontend-systems from 
our backends and take load off the data warehouse applications by inputting 
pre-processed data. This way we can easy collect and process all data and then 
do realtime OLAP queries using our propietary data warehouse technology.
+</p>
+<p>
+We are mostly impressed by the high speed, low maintenance approach Storm has 
provided us with. Also being able to easily scale up the system using more 
machines is a big plus. Since we're a small team it allows us to focus more on 
our core business instead of the underlying technology. You could say it has 
taken our hearts by storm!
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.appriver.com";>AppRiver</a>
+</td>
+<td>
+<p>
+We are using Storm to track internet threats from varied sources around the 
web.  It is always fast and reliable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mercadolibre.com/";>MercadoLibre</a>
+</td>
+<td>
+</td>
+</tr>
+
+
+</table>
\ No newline at end of file


Added: storm/branches/bobby-versioned-site/releases/0.9.6/Project-ideas.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Project-ideas.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Project-ideas.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Project-ideas.md Wed Mar 
16 21:18:57 2016
@@ -0,0 +1,6 @@
+---
+layout: documentation
+---
+ * **DSLs for non-JVM languages:** These DSL's should be all-inclusive and not 
require any Java for the creation of topologies, spouts, or bolts. Since 
topologies are [Thrift](http://thrift.apache.org/) structs, Nimbus is a Thrift 
service, and bolts can be written in any language, this is possible.
+ * **Online machine learning algorithms:** Something like 
[Mahout](http://mahout.apache.org/) but for online algorithms
+ * **Suite of performance benchmarks:** These benchmarks should test Storm's 
performance on CPU and IO intensive workloads. There should be benchmarks for 
different classes of applications, such as stream processing (where throughput 
is the priority) and distributed RPC (where latency is the priority). 
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.9.6/Rationale.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Rationale.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Rationale.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Rationale.md Wed Mar 16 
21:18:57 2016
@@ -0,0 +1,31 @@
+---
+layout: documentation
+---
+The past decade has seen a revolution in data processing. MapReduce, Hadoop, 
and related technologies have made it possible to store and process data at 
scales previously unthinkable. Unfortunately, these data processing 
technologies are not realtime systems, nor are they meant to be. There's no 
hack that will turn Hadoop into a realtime system; realtime data processing has 
a fundamentally different set of requirements than batch processing.
+
+However, realtime data processing at massive scale is becoming more and more 
of a requirement for businesses. The lack of a "Hadoop of realtime" has become 
the biggest hole in the data processing ecosystem.
+
+Storm fills that hole.
+
+Before Storm, you would typically have to manually build a network of queues 
and workers to do realtime processing. Workers would process messages off a 
queue, update databases, and send new messages to other queues for further 
processing. Unfortunately, this approach has serious limitations:
+
+1. **Tedious**: You spend most of your development time configuring where to 
send messages, deploying workers, and deploying intermediate queues. The 
realtime processing logic that you care about corresponds to a relatively small 
percentage of your codebase.
+2. **Brittle**: There's little fault-tolerance. You're responsible for keeping 
each worker and queue up.
+3. **Painful to scale**: When the message throughput get too high for a single 
worker or queue, you need to partition how the data is spread around. You need 
to reconfigure the other workers to know the new locations to send messages. 
This introduces moving parts and new pieces that can fail.
+
+Although the queues and workers paradigm breaks down for large numbers of 
messages, message processing is clearly the fundamental paradigm for realtime 
computation. The question is: how do you do it in a way that doesn't lose data, 
scales to huge volumes of messages, and is dead-simple to use and operate?
+
+Storm satisfies these goals. 
+
+## Why Storm is important
+
+Storm exposes a set of primitives for doing realtime computation. Like how 
MapReduce greatly eases the writing of parallel batch processing, Storm's 
primitives greatly ease the writing of parallel realtime computation.
+
+The key properties of Storm are:
+
+1. **Extremely broad set of use cases**: Storm can be used for processing 
messages and updating databases (stream processing), doing a continuous query 
on data streams and streaming the results into clients (continuous 
computation), parallelizing an intense query like a search query on the fly 
(distributed RPC), and more. Storm's small set of primitives satisfy a stunning 
number of use cases.
+2. **Scalable**: Storm scales to massive numbers of messages per second. To 
scale a topology, all you have to do is add machines and increase the 
parallelism settings of the topology. As an example of Storm's scale, one of 
Storm's initial applications processed 1,000,000 messages per second on a 10 
node cluster, including hundreds of database calls per second as part of the 
topology. Storm's usage of Zookeeper for cluster coordination makes it scale to 
much larger cluster sizes.
+3. **Guarantees no data loss**: A realtime system must have strong guarantees 
about data being successfully processed. A system that drops data has a very 
limited set of use cases. Storm guarantees that every message will be 
processed, and this is in direct contrast with other systems like S4. 
+4. **Extremely robust**: Unlike systems like Hadoop, which are notorious for 
being difficult to manage, Storm clusters just work. It is an explicit goal of 
the Storm project to make the user experience of managing Storm clusters as 
painless as possible.
+5. **Fault-tolerant**: If there are faults during execution of your 
computation, Storm will reassign tasks as necessary. Storm makes sure that a 
computation can run forever (or until you kill the computation).
+6. **Programming language agnostic**: Robust and scalable realtime processing 
shouldn't be limited to a single platform. Storm topologies and processing 
components can be defined in any language, making Storm accessible to nearly 
anyone.

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Running-topologies-on-a-production-cluster.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Running-topologies-on-a-production-cluster.md?rev=1735299&view=auto
==============================================================================
--- 
storm/branches/bobby-versioned-site/releases/0.9.6/Running-topologies-on-a-production-cluster.md
 (added)
+++ 
storm/branches/bobby-versioned-site/releases/0.9.6/Running-topologies-on-a-production-cluster.md
 Wed Mar 16 21:18:57 2016
@@ -0,0 +1,75 @@
+---
+layout: documentation
+---
+Running topologies on a production cluster is similar to running in [Local 
mode](Local-mode.html). Here are the steps:
+
+1) Define the topology (Use 
[TopologyBuilder](/apidocs/backtype/storm/topology/TopologyBuilder.html) if 
defining using Java)
+
+2) Use [StormSubmitter](/apidocs/backtype/storm/StormSubmitter.html) to submit 
the topology to the cluster. `StormSubmitter` takes as input the name of the 
topology, a configuration for the topology, and the topology itself. For 
example:
+
+```java
+Config conf = new Config();
+conf.setNumWorkers(20);
+conf.setMaxSpoutPending(5000);
+StormSubmitter.submitTopology("mytopology", conf, topology);
+```
+
+3) Create a jar containing your code and all the dependencies of your code 
(except for Storm -- the Storm jars will be added to the classpath on the 
worker nodes).
+
+If you're using Maven, the [Maven Assembly 
Plugin](http://maven.apache.org/plugins/maven-assembly-plugin/) can do the 
packaging for you. Just add this to your pom.xml:
+
+```xml
+  <plugin>
+    <artifactId>maven-assembly-plugin</artifactId>
+    <configuration>
+      <descriptorRefs>  
+        <descriptorRef>jar-with-dependencies</descriptorRef>
+      </descriptorRefs>
+      <archive>
+        <manifest>
+          <mainClass>com.path.to.main.Class</mainClass>
+        </manifest>
+      </archive>
+    </configuration>
+  </plugin>
+```
+Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure 
you 
[exclude](http://maven.apache.org/plugins/maven-assembly-plugin/examples/single/including-and-excluding-artifacts.html)
 the Storm jars since the cluster already has Storm on the classpath.
+
+4) Submit the topology to the cluster using the `storm` client, specifying the 
path to your jar, the classname to run, and any arguments it will use:
+
+`storm jar path/to/allmycode.jar org.me.MyTopology arg1 arg2 arg3`
+
+`storm jar` will submit the jar to the cluster and configure the 
`StormSubmitter` class to talk to the right cluster. In this example, after 
uploading the jar `storm jar` calls the main function on `org.me.MyTopology` 
with the arguments "arg1", "arg2", and "arg3".
+
+You can find out how to configure your `storm` client to talk to a Storm 
cluster on [Setting up development 
environment](Setting-up-development-environment.html).
+
+### Common configurations
+
+There are a variety of configurations you can set per topology. A list of all 
the configurations you can set can be found 
[here](/apidocs/backtype/storm/Config.html). The ones prefixed with "TOPOLOGY" 
can be overridden on a topology-specific basis (the other ones are cluster 
configurations and cannot be overridden). Here are some common ones that are 
set for a topology:
+
+1. **Config.TOPOLOGY_WORKERS**: This sets the number of worker processes to 
use to execute the topology. For example, if you set this to 25, there will be 
25 Java processes across the cluster executing all the tasks. If you had a 
combined 150 parallelism across all components in the topology, each worker 
process will have 6 tasks running within it as threads.
+2. **Config.TOPOLOGY_ACKERS**: This sets the number of tasks that will track 
tuple trees and detect when a spout tuple has been fully processed. Ackers are 
an integral part of Storm's reliability model and you can read more about them 
on [Guaranteeing message processing](Guaranteeing-message-processing.html).
+3. **Config.TOPOLOGY_MAX_SPOUT_PENDING**: This sets the maximum number of 
spout tuples that can be pending on a single spout task at once (pending means 
the tuple has not been acked or failed yet). It is highly recommended you set 
this config to prevent queue explosion.
+4. **Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS**: This is the maximum amount of 
time a spout tuple has to be fully completed before it is considered failed. 
This value defaults to 30 seconds, which is sufficient for most topologies. See 
[Guaranteeing message processing](Guaranteeing-message-processing.html) for 
more information on how Storm's reliability model works.
+5. **Config.TOPOLOGY_SERIALIZATIONS**: You can register more serializers to 
Storm using this config so that you can use custom types within tuples.
+
+
+### Killing a topology
+
+To kill a topology, simply run:
+
+`storm kill {stormname}`
+
+Give the same name to `storm kill` as you used when submitting the topology.
+
+Storm won't kill the topology immediately. Instead, it deactivates all the 
spouts so that they don't emit any more tuples, and then Storm waits 
Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. 
This gives the topology enough time to complete any tuples it was processing 
when it got killed.
+
+### Updating a running topology
+
+To update a running topology, the only option currently is to kill the current 
topology and resubmit a new one. A planned feature is to implement a `storm 
swap` command that swaps a running topology with a new one, ensuring minimal 
downtime and no chance of both topologies processing tuples at the same time. 
+
+### Monitoring topologies
+
+The best place to monitor a topology is using the Storm UI. The Storm UI 
provides information about errors happening in tasks and fine-grained stats on 
the throughput and latency performance of each component of each running 
topology.
+
+You can also look at the worker logs on the cluster machines.
\ No newline at end of file

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Serialization-(prior-to-0.6.0).md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Serialization-%28prior-to-0.6.0%29.md?rev=1735299&view=auto
==============================================================================
--- 
storm/branches/bobby-versioned-site/releases/0.9.6/Serialization-(prior-to-0.6.0).md
 (added)
+++ 
storm/branches/bobby-versioned-site/releases/0.9.6/Serialization-(prior-to-0.6.0).md
 Wed Mar 16 21:18:57 2016
@@ -0,0 +1,50 @@
+---
+layout: documentation
+---
+Tuples can be comprised of objects of any types. Since Storm is a distributed 
system, it needs to know how to serialize and deserialize objects when they're 
passed between tasks. By default Storm can serialize ints, shorts, longs, 
floats, doubles, bools, bytes, strings, and byte arrays, but if you want to use 
another type in your tuples, you'll need to implement a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in 
fields and Storm figures out the serialization dynamically. Before we get to 
the interface for serialization, let's spend a moment understanding why Storm's 
tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to 
Storm's API. Hadoop, for example, statically types its keys and values but 
requires a huge amount of annotations on the part of the user. Hadoop's API is 
a burden to use and the "type safety" isn't worth it. Dynamic typing is simply 
easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any 
reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from 
all those streams may have different types across the fields. When a Bolt 
receives a `Tuple` in `execute`, that tuple could have come from any stream and 
so could have any combination of types. There might be some reflection magic 
you can do to declare a different method for every tuple stream a bolt 
subscribes to, but Storm opts for the simpler, straightforward approach of 
dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a 
straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+Let's dive into Storm's API for defining custom serializations. There are two 
steps you need to take as a user to create a custom serialization: implement 
the serializer, and register the serializer to Storm.
+
+#### Creating a serializer
+
+Custom serializers implement the 
[ISerialization](/apidocs/backtype/storm/serialization/ISerialization.html) 
interface. Implementations specify how to serialize and deserialize types into 
a binary format.
+
+The interface looks like this:
+
+```java
+public interface ISerialization<T> {
+    public boolean accept(Class c);
+    public void serialize(T object, DataOutputStream stream) throws 
IOException;
+    public T deserialize(DataInputStream stream) throws IOException;
+}
+```
+
+Storm uses the `accept` method to determine if a type can be serialized by 
this serializer. Remember, Storm's tuples are dynamically typed so Storm 
determines what serializer to use at runtime.
+
+`serialize` writes the object out to the output stream in binary format. The 
field must be written in a way such that it can be deserialized later. For 
example, if you're writing out a list of objects, you'll need to write out the 
size of the list first so that you know how many elements to deserialize.
+
+`deserialize` reads the serialized object off of the stream and returns it.
+
+You can see example serialization implementations in the source for 
[SerializationFactory](https://github.com/apache/incubator-storm/blob/0.5.4/src/jvm/backtype/storm/serialization/SerializationFactory.java)
+
+#### Registering a serializer
+
+Once you create a serializer, you need to tell Storm it exists. This is done 
through the Storm configuration (See [Concepts](Concepts.html) for information 
about how configuration works in Storm). You can register serializations either 
through the config given when submitting a topology or in the storm.yaml files 
across your cluster.
+
+Serializer registrations are done through the Config.TOPOLOGY_SERIALIZATIONS 
config and is simply a list of serialization class names.
+
+Storm provides helpers for registering serializers in a topology config. The 
[Config](/apidocs/backtype/storm/Config.html) class has a method called 
`addSerialization` that takes in a serializer class to add to the config.
+
+There's an advanced config called Config.TOPOLOGY_SKIP_MISSING_SERIALIZATIONS. 
If you set this to true, Storm will ignore any serializations that are 
registered but do not have their code available on the classpath. Otherwise, 
Storm will throw errors when it can't find a serialization. This is useful if 
you run many topologies on a cluster that each have different serializations, 
but you want to declare all the serializations across all topologies in the 
`storm.yaml` files.
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.9.6/Serialization.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Serialization.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Serialization.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Serialization.md Wed Mar 
16 21:18:57 2016
@@ -0,0 +1,60 @@
+---
+layout: documentation
+---
+This page is about how the serialization system in Storm works for versions 
0.6.0 and onwards. Storm used a different serialization system prior to 0.6.0 
which is documented on [Serialization (prior to 
0.6.0)](Serialization-\(prior-to-0.6.0\).html). 
+
+Tuples can be comprised of objects of any types. Since Storm is a distributed 
system, it needs to know how to serialize and deserialize objects when they're 
passed between tasks.
+
+Storm uses [Kryo](http://code.google.com/p/kryo/) for serialization. Kryo is a 
flexible and fast serialization library that produces small serializations.
+
+By default, Storm can serialize primitive types, strings, byte arrays, 
ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to 
use another type in your tuples, you'll need to register a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in 
fields and Storm figures out the serialization dynamically. Before we get to 
the interface for serialization, let's spend a moment understanding why Storm's 
tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to 
Storm's API. Hadoop, for example, statically types its keys and values but 
requires a huge amount of annotations on the part of the user. Hadoop's API is 
a burden to use and the "type safety" isn't worth it. Dynamic typing is simply 
easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any 
reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from 
all those streams may have different types across the fields. When a Bolt 
receives a `Tuple` in `execute`, that tuple could have come from any stream and 
so could have any combination of types. There might be some reflection magic 
you can do to declare a different method for every tuple stream a bolt 
subscribes to, but Storm opts for the simpler, straightforward approach of 
dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a 
straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+As mentioned, Storm uses Kryo for serialization. To implement custom 
serializers, you need to register new serializers with Kryo. It's highly 
recommended that you read over [Kryo's home 
page](http://code.google.com/p/kryo/) to understand how it handles custom 
serialization.
+
+Adding custom serializers is done through the "topology.kryo.register" 
property in your topology config. It takes a list of registrations, where each 
registration can take one of two forms:
+
+1. The name of a class to register. In this case, Storm will use Kryo's 
`FieldsSerializer` to serialize the class. This may or may not be optimal for 
the class -- see the Kryo docs for more details.
+2. A map from the name of a class to register to an implementation of 
[com.esotericsoftware.kryo.Serializer](http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java).
+
+Let's look at an example.
+
+```
+topology.kryo.register:
+  - com.mycompany.CustomType1
+  - com.mycompany.CustomType2: com.mycompany.serializer.CustomType2Serializer
+  - com.mycompany.CustomType3
+```
+
+`com.mycompany.CustomType1` and `com.mycompany.CustomType3` will use the 
`FieldsSerializer`, whereas `com.mycompany.CustomType2` will use 
`com.mycompany.serializer.CustomType2Serializer` for serialization.
+
+Storm provides helpers for registering serializers in a topology config. The 
[Config](/apidocs/backtype/storm/Config.html) class has a method called 
`registerSerialization` that takes in a registration to add to the config.
+
+There's an advanced config called 
`Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS`. If you set this to true, 
Storm will ignore any serializations that are registered but do not have their 
code available on the classpath. Otherwise, Storm will throw errors when it 
can't find a serialization. This is useful if you run many topologies on a 
cluster that each have different serializations, but you want to declare all 
the serializations across all topologies in the `storm.yaml` files.
+
+### Java serialization
+
+If Storm encounters a type for which it doesn't have a serialization 
registered, it will use Java serialization if possible. If the object can't be 
serialized with Java serialization, then Storm will throw an error.
+
+Beware that Java serialization is extremely expensive, both in terms of CPU 
cost as well as the size of the serialized object. It is highly recommended 
that you register custom serializers when you put the topology in production. 
The Java serialization behavior is there so that it's easy to prototype new 
topologies.
+
+You can turn off the behavior to fall back on Java serialization by setting 
the `Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION` config to false.
+
+### Component-specific serialization registrations
+
+Storm 0.7.0 lets you set component-specific configurations (read more about 
this at [Configuration](Configuration.html)). Of course, if one component 
defines a serialization that serialization will need to be available to other 
bolts -- otherwise they won't be able to receive messages from that component!
+
+When a topology is submitted, a single set of serializations is chosen to be 
used by all components in the topology for sending messages. This is done by 
merging the component-specific serializer registrations with the regular set of 
serialization registrations. If two components define serializers for the same 
class, one of the serializers is chosen arbitrarily.
+
+To force a serializer for a particular class if there's a conflict between two 
component-specific registrations, just define the serializer you want to use in 
the topology-specific configuration. The topology-specific configuration has 
precedence over component-specific configurations for serialization 
registrations.
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.9.6/Serializers.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Serializers.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Serializers.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Serializers.md Wed Mar 
16 21:18:57 2016
@@ -0,0 +1,4 @@
+---
+layout: documentation
+---
+* [storm-json](https://github.com/rapportive-oss/storm-json): Simple JSON 
serializer for Storm
\ No newline at end of file

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-cluster.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-cluster.md?rev=1735299&view=auto
==============================================================================
--- 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-cluster.md
 (added)
+++ 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-cluster.md
 Wed Mar 16 21:18:57 2016
@@ -0,0 +1,83 @@
+---
+layout: documentation
+---
+This page outlines the steps for getting a Storm cluster up and running. If 
you're on AWS, you should check out the 
[storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) project. 
[storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) completely 
automates the provisioning, configuration, and installation of Storm clusters 
on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and 
network usage.
+
+If you run into difficulties with your Storm cluster, first check for a 
solution is in the [Troubleshooting](Troubleshooting.html) page. Otherwise, 
email the mailing list.
+
+Here's a summary of the steps for setting up a Storm cluster:
+
+1. Set up a Zookeeper cluster
+2. Install dependencies on Nimbus and worker machines
+3. Download and extract a Storm release to Nimbus and worker machines
+4. Fill in mandatory configurations into storm.yaml
+5. Launch daemons under supervision using "storm" script and a supervisor of 
your choice
+
+### Set up a Zookeeper cluster
+
+Storm uses Zookeeper for coordinating the cluster. Zookeeper **is not** used 
for message passing, so the load Storm places on Zookeeper is quite low. Single 
node Zookeeper clusters should be sufficient for most cases, but if you want 
failover or are deploying large Storm clusters you may want larger Zookeeper 
clusters. Instructions for deploying Zookeeper are 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html). 
+
+A few notes about Zookeeper deployment:
+
+1. It's critical that you run Zookeeper under supervision, since Zookeeper is 
fail-fast and will exit the process if it encounters any error case. See 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision)
 for more details. 
+2. It's critical that you set up a cron to compact Zookeeper's data and 
transaction logs. The Zookeeper daemon does not do this on its own, and if you 
don't set up a cron, Zookeeper will quickly run out of disk space. See 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance)
 for more details.
+
+### Install dependencies on Nimbus and worker machines
+
+Next you need to install Storm's dependencies on Nimbus and the worker 
machines. These are:
+
+1. Java 6
+2. Python 2.6.6
+
+These are the versions of the dependencies that have been tested with Storm. 
Storm may or may not work with different versions of Java and/or Python.
+
+
+### Download and extract a Storm release to Nimbus and worker machines
+
+Next, download a Storm release and extract the zip file somewhere on Nimbus 
and each of the worker machines. The Storm releases can be downloaded [from 
here](http://github.com/apache/incubator-storm/downloads).
+
+### Fill in mandatory configurations into storm.yaml
+
+The Storm release contains a file at `conf/storm.yaml` that configures the 
Storm daemons. You can see the default configuration values 
[here](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml).
 storm.yaml overrides anything in defaults.yaml. There's a few configurations 
that are mandatory to get a working cluster:
+
+1) **storm.zookeeper.servers**: This is a list of the hosts in the Zookeeper 
cluster for your Storm cluster. It should look something like:
+
+```yaml
+storm.zookeeper.servers:
+  - "111.222.333.444"
+  - "555.666.777.888"
+```
+
+If the port that your Zookeeper cluster uses is different than the default, 
you should set **storm.zookeeper.port** as well.
+
+2) **storm.local.dir**: The Nimbus and Supervisor daemons require a directory 
on the local disk to store small amounts of state (like jars, confs, and things 
like that). You should create that directory on each machine, give it proper 
permissions, and then fill in the directory location using this config. For 
example:
+
+```yaml
+storm.local.dir: "/mnt/storm"
+```
+
+3) **nimbus.host**: The worker nodes need to know which machine is the master 
in order to download topology jars and confs. For example:
+
+```yaml
+nimbus.host: "111.222.333.44"
+```
+
+4) **supervisor.slots.ports**: For each worker machine, you configure how many 
workers run on that machine with this config. Each worker uses a single port 
for receiving messages, and this setting defines which ports are open for use. 
If you define five ports here, then Storm will allocate up to five workers to 
run on this machine. If you define three ports, Storm will only run up to 
three. By default, this setting is configured to run 4 workers on the ports 
6700, 6701, 6702, and 6703. For example:
+
+```yaml
+supervisor.slots.ports:
+    - 6700
+    - 6701
+    - 6702
+    - 6703
+```
+
+### Launch daemons under supervision using "storm" script and a supervisor of 
your choice
+
+The last step is to launch all the Storm daemons. It is critical that you run 
each of these daemons under supervision. Storm is a __fail-fast__ system which 
means the processes will halt whenever an unexpected error is encountered. 
Storm is designed so that it can safely halt at any point and recover correctly 
when the process is restarted. This is why Storm keeps no state in-process -- 
if Nimbus or the Supervisors restart, the running topologies are unaffected. 
Here's how to run the Storm daemons:
+
+1. **Nimbus**: Run the command "bin/storm nimbus" under supervision on the 
master machine.
+2. **Supervisor**: Run the command "bin/storm supervisor" under supervision on 
each worker machine. The supervisor daemon is responsible for starting and 
stopping worker processes on that machine.
+3. **UI**: Run the Storm UI (a site you can access from the browser that gives 
diagnostics on the cluster and topologies) by running the command "bin/storm 
ui" under supervision. The UI can be accessed by navigating your web browser to 
http://{nimbus host}:8080. 
+
+As you can see, running the daemons is very straightforward. The daemons will 
log to the logs/ directory in wherever you extracted the Storm release.
\ No newline at end of file

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-project-in-Eclipse.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-project-in-Eclipse.md?rev=1735299&view=auto
==============================================================================
--- 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-project-in-Eclipse.md
 (added)
+++ 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-a-Storm-project-in-Eclipse.md
 Wed Mar 16 21:18:57 2016
@@ -0,0 +1 @@
+- fill me in
\ No newline at end of file

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-development-environment.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-development-environment.md?rev=1735299&view=auto
==============================================================================
--- 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-development-environment.md
 (added)
+++ 
storm/branches/bobby-versioned-site/releases/0.9.6/Setting-up-development-environment.md
 Wed Mar 16 21:18:57 2016
@@ -0,0 +1,39 @@
+---
+layout: documentation
+---
+This page outlines what you need to do to get a Storm development environment 
set up. In summary, the steps are:
+
+1. Download a [Storm release](..//downloads.html) , unpack it, and put the 
unpacked `bin/` directory on your PATH
+2. To be able to start and stop topologies on a remote cluster, put the 
cluster information in `~/.storm/storm.yaml`
+
+More detail on each of these steps is below.
+
+### What is a development environment?
+
+Storm has two modes of operation: local mode and remote mode. In local mode, 
you can develop and test topologies completely in process on your local 
machine. In remote mode, you submit topologies for execution on a cluster of 
machines.
+
+A Storm development environment has everything installed so that you can 
develop and test Storm topologies in local mode, package topologies for 
execution on a remote cluster, and submit/kill topologies on a remote cluster.
+
+Let's quickly go over the relationship between your machine and a remote 
cluster. A Storm cluster is managed by a master node called "Nimbus". Your 
machine communicates with Nimbus to submit code (packaged as a jar) and 
topologies for execution on the cluster, and Nimbus will take care of 
distributing that code around the cluster and assigning workers to run your 
topology. Your machine uses a command line client called `storm` to communicate 
with Nimbus. The `storm` client is only used for remote mode; it is not used 
for developing and testing topologies in local mode.
+
+### Installing a Storm release locally
+
+If you want to be able to submit topologies to a remote cluster from your 
machine, you should install a Storm release locally. Installing a Storm release 
will give you the `storm` client that you can use to interact with remote 
clusters. To install Storm locally, download a release [from 
here](https://github.com/apache/incubator-storm/downloads) and unzip it 
somewhere on your computer. Then add the unpacked `bin/` directory onto your 
`PATH` and make sure the `bin/storm` script is executable.
+
+Installing a Storm release locally is only for interacting with remote 
clusters. For developing and testing topologies in local mode, it is 
recommended that you use Maven to include Storm as a dev dependency for your 
project. You can read more about using Maven for this purpose on 
[Maven](Maven.html). 
+
+### Starting and stopping topologies on a remote cluster
+
+The previous step installed the `storm` client on your machine which is used 
to communicate with remote Storm clusters. Now all you have to do is tell the 
client which Storm cluster to talk to. To do this, all you have to do is put 
the host address of the master in the `~/.storm/storm.yaml` file. It should 
look something like this:
+
+```
+nimbus.host: "123.45.678.890"
+```
+
+Alternatively, if you use the 
[storm-deploy](https://github.com/nathanmarz/storm-deploy) project to provision 
Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml 
file. You can manually attach to a Storm cluster (or switch between multiple 
clusters) using the "attach" command, like so:
+
+```
+lein run :deploy --attach --name mystormcluster
+```
+
+More information is on the storm-deploy 
[wiki](https://github.com/nathanmarz/storm-deploy/wiki)
\ No newline at end of file

Added: 
storm/branches/bobby-versioned-site/releases/0.9.6/Spout-implementations.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Spout-implementations.md?rev=1735299&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.9.6/Spout-implementations.md 
(added)
+++ storm/branches/bobby-versioned-site/releases/0.9.6/Spout-implementations.md 
Wed Mar 16 21:18:57 2016
@@ -0,0 +1,8 @@
+---
+layout: documentation
+---
+* [storm-kestrel](https://github.com/nathanmarz/storm-kestrel): Adapter to use 
Kestrel as a spout
+* [storm-amqp-spout](https://github.com/rapportive-oss/storm-amqp-spout): 
Adapter to use AMQP source as a spout
+* [storm-jms](https://github.com/ptgoetz/storm-jms): Adapter to use a JMS 
source as a spout
+* [storm-redis-pubsub](https://github.com/sorenmacbeth/storm-redis-pubsub): A 
spout that subscribes to a Redis pubsub stream
+* 
[storm-beanstalkd-spout](https://github.com/haitaoyao/storm-beanstalkd-spout): 
A spout that subscribes to a beanstalkd queue
\ No newline at end of file

svn commit: r1735299 [3/6] - in /storm/branches/bobby-versioned-site: _includes/ releases/0.9.6/ releases/0.9.6/images/

Reply via email to