Author: mafr
Date: Sun Nov 18 20:45:23 2012
New Revision: 1410987
URL: http://svn.apache.org/viewvc?rev=1410987&view=rev
Log:
CRUNCH-116: Add a Getting Started document.
Compatibility notes contributed by Josh.
Added:
incubator/crunch/site/trunk/content/crunch/getting-started.mdtext
Modified:
incubator/crunch/site/trunk/lib/path.pm
Added: incubator/crunch/site/trunk/content/crunch/getting-started.mdtext
URL:
http://svn.apache.org/viewvc/incubator/crunch/site/trunk/content/crunch/getting-started.mdtext?rev=1410987&view=auto
==============================================================================
--- incubator/crunch/site/trunk/content/crunch/getting-started.mdtext (added)
+++ incubator/crunch/site/trunk/content/crunch/getting-started.mdtext Sun Nov
18 20:45:23 2012
@@ -0,0 +1,100 @@
+Title: Getting Started
+Notice: Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Crunch is developed against Apache Hadoop version 1.0.3 and is also tested
against
+Apache Hadoop 2.0.0-alpha. Crunch should work with any version of Hadoop
+after 1.0.3 or 2.0.0-alpha, and is also known to work with distributions from
+vendors like Cloudera, Hortonworks, and IBM. Crunch is _not_ compatible with
+versions of Hadoop prior to 1.0.x or 2.0.x, such as Apache Hadoop 0.20.x.
+
+The easiest way to get started with Crunch is to use its Maven archetype
+to generate a simple project. The archetype is available from Maven Central;
+just enter the following command, answer a few questions, and you're ready to
+go:
+
+<pre>
+$ <strong>mvn archetype:generate
-Dfilter=org.apache.crunch:crunch-archetype</strong>
+[...]
+1: remote -> org.apache.crunch:crunch-archetype (Create a basic,
self-contained job for Apache Crunch.)
+Choose a number or apply filter (format: [groupId:]artifactId, case sensitive
contains): : <strong>1</strong>
+Define value for property 'groupId': : <strong>com.example</strong>
+Define value for property 'artifactId': : <strong>crunch-demo</strong>
+Define value for property 'version': 1.0-SNAPSHOT: : <strong>[HIT
ENTER]</strong>
+Define value for property 'package': com.example: : <strong>[HIT
ENTER]</strong>
+Confirm properties configuration:
+groupId: com.example
+artifactId: crunch-demo
+version: 1.0-SNAPSHOT
+package: com.example
+ Y: : <strong>[HIT ENTER]</strong>
+[...]
+$
+</pre>
+
+The generated Maven project contains an example application that counts
+word frequencies in text files:
+
+<pre>
+$ <strong>cd crunch-demo</strong>
+$ <strong>tree</strong>
+.
+|-- pom.xml
+`-- src
+ |-- main
+ | |-- assembly
+ | | `-- <strong>hadoop-job.xml</strong>
+ | `-- java
+ | `-- com
+ | `-- example
+ | |-- StopWordFilter.java
+ | |-- Tokenizer.java
+ | `-- <strong>WordCount.java</strong>
+ `-- test
+ `-- java
+ `-- com
+ `-- example
+ |-- StopWordFilterTest.java
+ `-- TokenizerTest.java
+</pre>
+
+The `WordCount.java` file contains the main class that defines a Crunch-based
+application which is referenced from `pom.xml`.
+
+Build the code:
+
+<pre>
+$ <strong>mvn package</strong>
+</pre>
+
+Your packaged application is created in the `target` directory. The build
+process uses Maven's assembly plugin with some configuration in
+`hadoop-job.xml` to create a special JAR file (suffix `-job.jar`).
+Depending on your Hadoop configuration, you can run it locally or on a
+cluster using Hadoop's launcher script:
+
+<pre>
+$ <strong>hadoop jar target/hadoop-job-demo-1.0-SNAPSHOT-job.jar <in>
<out></strong>
+</pre>
+
+The `<in>` parameter references a text file or a directory containing text
+files, while `<out>` is a directory where Crunch writes the final results to.
+
+Crunch also lets you run applications from within an IDE, either as standalone
+Java applications or from unit tests. All required dependencies are on Maven's
+classpath so you can run the `WordCount` class directly without any additional
+setup.
Modified: incubator/crunch/site/trunk/lib/path.pm
URL:
http://svn.apache.org/viewvc/incubator/crunch/site/trunk/lib/path.pm?rev=1410987&r1=1410986&r2=1410987&view=diff
==============================================================================
--- incubator/crunch/site/trunk/lib/path.pm (original)
+++ incubator/crunch/site/trunk/lib/path.pm Sun Nov 18 20:45:23 2012
@@ -5,6 +5,8 @@ our @nav = (
{ title => "Apache Crunch" },
{ title => "Overview",
href => "/crunch/index.html"},
+ { title => "Getting Started",
+ href => "/crunch/getting-started.html"},
{ title => "Download",
href => "/crunch/download.html"},
{ title => "API",