SAMZA-260; run hello-samza without internet

Project: http://git-wip-us.apache.org/repos/asf/incubator-samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-samza/commit/ef25b9e1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-samza/tree/ef25b9e1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-samza/diff/ef25b9e1

Branch: refs/heads/0.7.0
Commit: ef25b9e1632c67b949f39862276c0f09d547657d
Parents: 5162af8
Author: Yan Fang <[email protected]>
Authored: Tue May 13 14:05:02 2014 -0700
Committer: Martin Kleppmann <[email protected]>
Committed: Tue Jun 10 12:05:06 2014 +0100

----------------------------------------------------------------------
 docs/learn/tutorials/0.7.0/index.md             |  2 +
 .../0.7.0/run-hello-samza-without-internet.md   | 61 ++++++++++++++++++++
 docs/startup/hello-samza/0.7.0/index.md         |  2 +
 3 files changed, 65 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/learn/tutorials/0.7.0/index.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/0.7.0/index.md 
b/docs/learn/tutorials/0.7.0/index.md
index cafc092..5822cce 100644
--- a/docs/learn/tutorials/0.7.0/index.md
+++ b/docs/learn/tutorials/0.7.0/index.md
@@ -9,6 +9,8 @@ title: Tutorials
 
 [Run Hello-samza in Multi-node YARN](run-in-multi-node-yarn.html)
 
+[Run Hello-samza without Internet](run-hello-samza-without-internet.html)
+
 <!-- TODO a bunch of tutorials
 [Log Walkthrough](log-walkthrough.html)
 <a href="configuring-kafka-system.html">Configuring a Kafka System</a><br/>

http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md 
b/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md
new file mode 100644
index 0000000..f7a0c1b
--- /dev/null
+++ b/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md
@@ -0,0 +1,61 @@
+---
+layout: page
+title: Run Hello Samza without Internet
+---
+
+This tutorial is to help you run [Hello 
Samza](../../../startup/hello-samza/0.7.0/) if you can not connect to the 
internet. 
+
+### Test Your Connection
+
+Ping irc.wikimedia.org. Sometimes the firewall in your company blocks this 
service.
+
+```
+telnet irc.wikimedia.org 6667
+```
+
+You should see something like this:
+
+```
+Trying 208.80.152.178...
+Connected to ekrem.wikimedia.org.
+Escape character is '^]'.
+NOTICE AUTH :*** Processing connection to irc.pmtpa.wikimedia.org
+NOTICE AUTH :*** Looking up your hostname...
+NOTICE AUTH :*** Checking Ident
+NOTICE AUTH :*** Found your hostname
+```
+
+Otherwise, you may have the connection problem.
+
+### Use Local Data to Run Hello Samza
+
+We provide an alternative to get wikipedia feed data. Instead of running
+
+```
+deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+```
+
+You will run
+```
+bin/produce-wikipedia-raw-data.sh
+``` 
+
+This script will read wikipedia feed data from local file and produce them to 
the Kafka broker. By default, it produces to localhost:9092 as the Kafka broker 
and uses localhost:2181 as zookeeper. You can overwrite them:
+
+```
+bin/produce-wikipedia-raw-data.sh -b yourKafkaBrokerAddress -z 
yourZookeeperAddress
+```
+
+Now you can go back to Generate Wikipedia Statistics section in [Hello 
Samza](../../../startup/hello-samza/0.7.0/) and follow the remaining steps.
+
+### A Little Explanation
+
+The goal of 
+
+```
+deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+```
+
+is to deploy a Samza job which listens to wikipedia API, receives the feed in 
realtime and produces the feed to the Kafka topic wikipedia-raw. The 
alternative in this tutorial is reading local wikipedia feed in an infinite 
loop and producing the data to Kafka wikipedia-raw. The follow-up job, 
wikipedia-parser is getting data from Kafka topic wikipedia-raw, so as long as 
we have correct data in Kafka topic wikipedia-raw, we are fine. All Samza jobs 
are connected by the Kafka and do not depend on each other.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/startup/hello-samza/0.7.0/index.md
----------------------------------------------------------------------
diff --git a/docs/startup/hello-samza/0.7.0/index.md 
b/docs/startup/hello-samza/0.7.0/index.md
index 6a88a30..11fc18b 100644
--- a/docs/startup/hello-samza/0.7.0/index.md
+++ b/docs/startup/hello-samza/0.7.0/index.md
@@ -46,6 +46,8 @@ The job will consume a feed of real-time edits from 
Wikipedia, and produce them
 
 Pretty neat, right? Now, check out the YARN UI again 
([http://localhost:8088](http://localhost:8088)). This time around, you'll see 
your Samza job is running!
 
+If you can not see any output from Kafka consumer, you may have connection 
problem. Check 
[here](../../../learn/tutorials/0.7.0/run-hello-samza-without-internet.html).
+
 ### Generate Wikipedia Statistics
 
 Let's calculate some statistics based on the messages in the wikipedia-raw 
topic. Start two more jobs:

Reply via email to