SAMZA-260; run hello-samza without internet
Project: http://git-wip-us.apache.org/repos/asf/incubator-samza/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-samza/commit/ef25b9e1 Tree: http://git-wip-us.apache.org/repos/asf/incubator-samza/tree/ef25b9e1 Diff: http://git-wip-us.apache.org/repos/asf/incubator-samza/diff/ef25b9e1 Branch: refs/heads/0.7.0 Commit: ef25b9e1632c67b949f39862276c0f09d547657d Parents: 5162af8 Author: Yan Fang <[email protected]> Authored: Tue May 13 14:05:02 2014 -0700 Committer: Martin Kleppmann <[email protected]> Committed: Tue Jun 10 12:05:06 2014 +0100 ---------------------------------------------------------------------- docs/learn/tutorials/0.7.0/index.md | 2 + .../0.7.0/run-hello-samza-without-internet.md | 61 ++++++++++++++++++++ docs/startup/hello-samza/0.7.0/index.md | 2 + 3 files changed, 65 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/learn/tutorials/0.7.0/index.md ---------------------------------------------------------------------- diff --git a/docs/learn/tutorials/0.7.0/index.md b/docs/learn/tutorials/0.7.0/index.md index cafc092..5822cce 100644 --- a/docs/learn/tutorials/0.7.0/index.md +++ b/docs/learn/tutorials/0.7.0/index.md @@ -9,6 +9,8 @@ title: Tutorials [Run Hello-samza in Multi-node YARN](run-in-multi-node-yarn.html) +[Run Hello-samza without Internet](run-hello-samza-without-internet.html) + <!-- TODO a bunch of tutorials [Log Walkthrough](log-walkthrough.html) <a href="configuring-kafka-system.html">Configuring a Kafka System</a><br/> http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md ---------------------------------------------------------------------- diff --git a/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md b/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md new file mode 100644 index 0000000..f7a0c1b --- /dev/null +++ b/docs/learn/tutorials/0.7.0/run-hello-samza-without-internet.md @@ -0,0 +1,61 @@ +--- +layout: page +title: Run Hello Samza without Internet +--- + +This tutorial is to help you run [Hello Samza](../../../startup/hello-samza/0.7.0/) if you can not connect to the internet. + +### Test Your Connection + +Ping irc.wikimedia.org. Sometimes the firewall in your company blocks this service. + +``` +telnet irc.wikimedia.org 6667 +``` + +You should see something like this: + +``` +Trying 208.80.152.178... +Connected to ekrem.wikimedia.org. +Escape character is '^]'. +NOTICE AUTH :*** Processing connection to irc.pmtpa.wikimedia.org +NOTICE AUTH :*** Looking up your hostname... +NOTICE AUTH :*** Checking Ident +NOTICE AUTH :*** Found your hostname +``` + +Otherwise, you may have the connection problem. + +### Use Local Data to Run Hello Samza + +We provide an alternative to get wikipedia feed data. Instead of running + +``` +deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties +``` + +You will run +``` +bin/produce-wikipedia-raw-data.sh +``` + +This script will read wikipedia feed data from local file and produce them to the Kafka broker. By default, it produces to localhost:9092 as the Kafka broker and uses localhost:2181 as zookeeper. You can overwrite them: + +``` +bin/produce-wikipedia-raw-data.sh -b yourKafkaBrokerAddress -z yourZookeeperAddress +``` + +Now you can go back to Generate Wikipedia Statistics section in [Hello Samza](../../../startup/hello-samza/0.7.0/) and follow the remaining steps. + +### A Little Explanation + +The goal of + +``` +deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties +``` + +is to deploy a Samza job which listens to wikipedia API, receives the feed in realtime and produces the feed to the Kafka topic wikipedia-raw. The alternative in this tutorial is reading local wikipedia feed in an infinite loop and producing the data to Kafka wikipedia-raw. The follow-up job, wikipedia-parser is getting data from Kafka topic wikipedia-raw, so as long as we have correct data in Kafka topic wikipedia-raw, we are fine. All Samza jobs are connected by the Kafka and do not depend on each other. + + http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/ef25b9e1/docs/startup/hello-samza/0.7.0/index.md ---------------------------------------------------------------------- diff --git a/docs/startup/hello-samza/0.7.0/index.md b/docs/startup/hello-samza/0.7.0/index.md index 6a88a30..11fc18b 100644 --- a/docs/startup/hello-samza/0.7.0/index.md +++ b/docs/startup/hello-samza/0.7.0/index.md @@ -46,6 +46,8 @@ The job will consume a feed of real-time edits from Wikipedia, and produce them Pretty neat, right? Now, check out the YARN UI again ([http://localhost:8088](http://localhost:8088)). This time around, you'll see your Samza job is running! +If you can not see any output from Kafka consumer, you may have connection problem. Check [here](../../../learn/tutorials/0.7.0/run-hello-samza-without-internet.html). + ### Generate Wikipedia Statistics Let's calculate some statistics based on the messages in the wikipedia-raw topic. Start two more jobs:
