path.pm

mafr Sun, 18 Nov 2012 12:45:47 -0800

Author: mafr
Date: Sun Nov 18 20:45:23 2012
New Revision: 1410987

URL: http://svn.apache.org/viewvc?rev=1410987&view=rev
Log:
CRUNCH-116: Add a Getting Started document.
Compatibility notes contributed by Josh.


Added:
    incubator/crunch/site/trunk/content/crunch/getting-started.mdtext
Modified:
    incubator/crunch/site/trunk/lib/path.pm

Added: incubator/crunch/site/trunk/content/crunch/getting-started.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/crunch/site/trunk/content/crunch/getting-started.mdtext?rev=1410987&view=auto
==============================================================================
--- incubator/crunch/site/trunk/content/crunch/getting-started.mdtext (added)
+++ incubator/crunch/site/trunk/content/crunch/getting-started.mdtext Sun Nov 
18 20:45:23 2012
@@ -0,0 +1,100 @@
+Title:    Getting Started
+Notice:   Licensed to the Apache Software Foundation (ASF) under one
+          or more contributor license agreements.  See the NOTICE file
+          distributed with this work for additional information
+          regarding copyright ownership.  The ASF licenses this file
+          to you under the Apache License, Version 2.0 (the
+          "License"); you may not use this file except in compliance
+          with the License.  You may obtain a copy of the License at
+          .
+            http://www.apache.org/licenses/LICENSE-2.0
+          .
+          Unless required by applicable law or agreed to in writing,
+          software distributed under the License is distributed on an
+          "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+          KIND, either express or implied.  See the License for the
+          specific language governing permissions and limitations
+          under the License.
+
+Crunch is developed against Apache Hadoop version 1.0.3 and is also tested 
against
+Apache Hadoop 2.0.0-alpha. Crunch should work with any version of Hadoop
+after 1.0.3 or 2.0.0-alpha, and is also known to work with distributions from
+vendors like Cloudera, Hortonworks, and IBM. Crunch is _not_ compatible with
+versions of Hadoop prior to 1.0.x or 2.0.x, such as Apache Hadoop 0.20.x.
+
+The easiest way to get started with Crunch is to use its Maven archetype
+to generate a simple project. The archetype is available from Maven Central;
+just enter the following command, answer a few questions, and you're ready to
+go:
+
+<pre>
+$ <strong>mvn archetype:generate 
-Dfilter=org.apache.crunch:crunch-archetype</strong>
+[...]
+1: remote -> org.apache.crunch:crunch-archetype (Create a basic, 
self-contained job for Apache Crunch.)
+Choose a number or apply filter (format: [groupId:]artifactId, case sensitive 
contains): : <strong>1</strong>
+Define value for property 'groupId': : <strong>com.example</strong>
+Define value for property 'artifactId': : <strong>crunch-demo</strong>
+Define value for property 'version':  1.0-SNAPSHOT: : <strong>[HIT 
ENTER]</strong>
+Define value for property 'package':  com.example: : <strong>[HIT 
ENTER]</strong>
+Confirm properties configuration:
+groupId: com.example
+artifactId: crunch-demo
+version: 1.0-SNAPSHOT
+package: com.example
+ Y: : <strong>[HIT ENTER]</strong>
+[...]
+$
+</pre>
+
+The generated Maven project contains an example application that counts
+word frequencies in text files:
+
+<pre>
+$ <strong>cd crunch-demo</strong>
+$ <strong>tree</strong>
+.
+|-- pom.xml
+`-- src
+    |-- main
+    |   |-- assembly
+    |   |   `-- <strong>hadoop-job.xml</strong>
+    |   `-- java
+    |       `-- com
+    |           `-- example
+    |               |-- StopWordFilter.java
+    |               |-- Tokenizer.java
+    |               `-- <strong>WordCount.java</strong>
+    `-- test
+        `-- java
+            `-- com
+                `-- example
+                    |-- StopWordFilterTest.java
+                    `-- TokenizerTest.java
+</pre>
+ 
+The `WordCount.java` file contains the main class that defines a Crunch-based
+application which is referenced from `pom.xml`.
+
+Build the code:
+
+<pre>
+$ <strong>mvn package</strong>
+</pre>
+
+Your packaged application is created in the `target` directory. The build
+process uses Maven's assembly plugin with some configuration in
+`hadoop-job.xml` to create a special JAR file (suffix `-job.jar`).
+Depending on your Hadoop configuration, you can run it locally or on a
+cluster using Hadoop's launcher script:
+
+<pre>
+$ <strong>hadoop jar target/hadoop-job-demo-1.0-SNAPSHOT-job.jar &lt;in&gt; 
&lt;out&gt;</strong>
+</pre>
+
+The `<in>` parameter references a text file or a directory containing text
+files, while `<out>` is a directory where Crunch writes the final results to.
+
+Crunch also lets you run applications from within an IDE, either as standalone
+Java applications or from unit tests. All required dependencies are on Maven's
+classpath so you can run the `WordCount` class directly without any additional
+setup.

Modified: incubator/crunch/site/trunk/lib/path.pm
URL: 
http://svn.apache.org/viewvc/incubator/crunch/site/trunk/lib/path.pm?rev=1410987&r1=1410986&r2=1410987&view=diff
==============================================================================
--- incubator/crunch/site/trunk/lib/path.pm (original)
+++ incubator/crunch/site/trunk/lib/path.pm Sun Nov 18 20:45:23 2012
@@ -5,6 +5,8 @@ our @nav = (
     { title => "Apache Crunch" },
        { title => "Overview",
          href => "/crunch/index.html"},
+       { title => "Getting Started",
+         href => "/crunch/getting-started.html"},
        { title => "Download",
          href => "/crunch/download.html"},
        { title => "API",

svn commit: r1410987 - in /incubator/crunch/site/trunk: content/crunch/getting-started.mdtext lib/path.pm

Reply via email to