http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md deleted file mode 100644 index 0c7ca5d..0000000 --- a/README.md +++ /dev/null @@ -1,267 +0,0 @@ -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Welcome to Apache Mahout! -=========== -The Apache Mahout⢠project's goal is to build an environment for quickly creating scalable performant machine learning applications. - -For additional information about Mahout, visit the [Mahout Home Page](http://mahout.apache.org/) - -#### Setting up your Environment -Whether you are using Mahout's Shell, running command line jobs or using it as a library to build your own apps you'll need to setup several environment variables. Edit your environment in `~/.bash_profile` for Mac or `~/.bashrc` for many linux distributions. Add the following -``` -export MAHOUT_HOME=/path/to/mahout -export MAHOUT_LOCAL=true # for running standalone on your dev machine, -# unset MAHOUT_LOCAL for running on a cluster -``` -You will need a `$JAVA_HOME`, and if you are running on Spark, you will also need `$SPARK_HOME` - -#### Using Mahout as a Library -Running any application that uses Mahout will require installing a binary or source version and setting the environment. -To compile from source: -* `mvn -DskipTests clean install` -* To run tests do `mvn test` -* To set up your IDE, do `mvn eclipse:eclipse` or `mvn idea:idea` - -To use maven, add the appropriate setting to your pom.xml or build.sbt following the template below. - - -To use the Samsara environment you'll need to include both the engine neutral math-scala dependency: -``` -<dependency> - <groupId>org.apache.mahout</groupId> - <artifactId>mahout-math-scala_2.10</artifactId> - <version>${mahout.version}</version> -</dependency> -``` -and a dependency for back end engine translation, e.g: -``` -<dependency> - <groupId>org.apache.mahout</groupId> - <artifactId>mahout-spark_2.10</artifactId> - <version>${mahout.version}</version> -</dependency> -``` -#### Building From Source - -###### Prerequisites: - -Linux Environment (preferably Ubuntu 16.04.x) Note: Currently only the JVM-only build will work on a Mac. -gcc > 4.x -NVIDIA Card (installed with OpenCL drivers alongside usual GPU drivers) - -###### Downloads - -Install java 1.7+ in an easily accessible directory (for this example, ~/java/) -http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html - -Create a directory ~/apache/ . - -Download apache Maven 3.3.9 and un-tar/gunzip to ~/apache/apache-maven-3.3.9/ . -https://maven.apache.org/download.cgi - -Download and un-tar/gunzip Hadoop 2.4.1 to ~/apache/hadoop-2.4.1/ . -https://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/ - -Download and un-tar/gunzip spark-1.6.3-bin-hadoop2.4 to ~/apache/ . -http://spark.apache.org/downloads.html -Choose release: Spark-1.6.3 (Nov 07 2016) -Choose package type: Pre-Built for Hadoop 2.4 - -Install ViennaCL 1.7.0+ -If running Ubuntu 16.04+ - -``` -sudo apt-get install libviennacl-dev -``` - -Otherwise if your distributionâs package manager does not have a viennniacl-dev package >1.7.0, clone it directly into the directory which will be included in when being compiled by Mahout: - -``` -mkdir ~/tmp -cd ~/tmp && git clone https://github.com/viennacl/viennacl-dev.git -cp -r viennacl/ /usr/local/ -cp -r CL/ /usr/local/ -``` - -Ensure that the OpenCL 1.2+ drivers are installed (packed with most consumer grade NVIDIA drivers). Not sure about higher end cards. - -Clone mahout repository into `~/apache`. - -``` -git clone https://github.com/apache/mahout.git -``` - -###### Configuration - -When building mahout for a spark backend, we need four System Environment variables set: -``` - export MAHOUT_HOME=/home/<user>/apache/mahout - export HADOOP_HOME=/home/<user>/apache/hadoop-2.4.1 - export SPARK_HOME=/home/<user>/apache/spark-1.6.3-bin-hadoop2.4 - export JAVA_HOME=/home/<user>/java/jdk-1.8.121 -``` - -Mahout on Spark regularly uses one more env variable, the IP of the Spark clusterâs master node (usually the node which one would be logged into). - -To use 4 local cores (Spark master need not be running) -``` -export MASTER=local[4] -``` -To use all available local cores (again, Spark master need not be running) -``` -export MASTER=local[*] -``` -To point to a cluster with spark running: -``` -export MASTER=spark://master.ip.address:7077 -``` - -We then add these to the path: - -``` - PATH=$PATH$:MAHOUT_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$JAVA_HOME/bin -``` - -These should be added to the your ~/.bashrc file. - - -###### Building Mahout with Apache Maven - -Currently Mahout has 3 builds. From the $MAHOUT_HOME directory we may issue the commands to build each using mvn profiles. - -JVM only: -``` -mvn clean install -DskipTests -``` - -JVM with native OpenMP level 2 and level 3 matrix/vector Multiplication -``` -mvn clean install -Pviennacl-omp -Phadoop2 -DskipTests -``` -JVM with native OpenMP and OpenCL for Level 2 and level 3 matrix/vector Multiplication. (GPU errors fall back to OpenMP, currently only a single GPU/node is supported). -``` -mvn clean install -Pviennacl -Phadoop2 -DskipTests -``` - -#### Testing the Mahout Environment - -Mahout provides an extension to the spark-shell, which is good for getting to know the language, testing partition loads, prototyping algorithms, etc.. - -To launch the shell in local mode with 2 threads: simply do the following: -``` -$ MASTER=local[2] mahout spark-shell -``` - -After a very verbose startup, a Mahout welcome screen will appear: - -``` -Loading /home/andy/sandbox/apache-mahout-distribution-0.13.0/bin/load-shell.scala... -import org.apache.mahout.math._ -import org.apache.mahout.math.scalabindings._ -import org.apache.mahout.math.drm._ -import org.apache.mahout.math.scalabindings.RLikeOps._ -import org.apache.mahout.math.drm.RLikeDrmOps._ -import org.apache.mahout.sparkbindings._ -sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@3ca1f0a4 - - _ _ -_ __ ___ __ _| |__ ___ _ _| |_ - '_ ` _ \ / _` | '_ \ / _ \| | | | __| - | | | | (_| | | | | (_) | |_| | |_ -_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0 - - -That file does not exist - - -scala> -``` -At the scala> prompt, enter: -``` -scala> :load /home/<andy>/apache/mahout/examples - /bin/SparseSparseDrmTimer.mscala -``` -Which will load a matrix multiplication timer function definition. To run the matrix timer: -``` - scala> timeSparseDRMMMul(1000,1000,1000,1,.02,1234L) - {...} res3: Long = 16321 -``` -We can see that the JVM only version is rather slow, thus our motive for GPU and Native Multithreading support. - -To get an idea of whatâs going on under the hood of the timer, we may examine the .mscala (mahout scala) code which is both fully functional scala and the Mahout R-Like DSL for tensor algebra: -``` - - - -def timeSparseDRMMMul(m: Int, n: Int, s: Int, para: Int, pctDense: Double = .20, seed: Long = 1234L): Long = { - val drmA = drmParallelizeEmpty(m , s, para).mapBlock(){ - case (keys,block:Matrix) => - val R = scala.util.Random - R.setSeed(seed) - val blockB = new SparseRowMatrix(block.nrow, block.ncol) - blockB := {x => if (R.nextDouble < pctDense) R.nextDouble else x } - (keys -> blockB) - } - - val drmB = drmParallelizeEmpty(s , n, para).mapBlock(){ - case (keys,block:Matrix) => - val R = scala.util.Random - R.setSeed(seed + 1) - val blockB = new SparseRowMatrix(block.nrow, block.ncol) - blockB := {x => if (R.nextDouble < pctDense) R.nextDouble else x } - (keys -> blockB) - } - - var time = System.currentTimeMillis() - - val drmC = drmA %*% drmB - - // trigger computation - drmC.numRows() - - time = System.currentTimeMillis() - time - - time - -} -``` - -For more information please see the following references: - -http://mahout.apache.org/users/environment/in-core-reference.html - -http://mahout.apache.org/users/environment/out-of-core-reference.html - -http://mahout.apache.org/users/sparkbindings/play-with-shell.html - -http://mahout.apache.org/users/environment/classify-a-doc-from-the-shell.html - -Note that due to an intermittent out-of-memory bug in a Flink test we have disabled it from the binary releases. To use Flink please uncomment the line in the root pom.xml in the `<modules>` block so it reads `<module>flink</module>`. - -#### Examples -For examples of how to use Mahout, see the examples directory located in `examples/bin` - -For information on how to contribute, visit the [How to Contribute Page](https://mahout.apache.org/developers/how-to-contribute.html) - -#### Legal -Please see the `NOTICE.txt` included in this directory for more information. - -[](https://travis-ci.org/apache/mahout) -<!-- -[](https://coveralls.io/github/apache/mahout?branch=master) --->
http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/compute-classpath.sh ---------------------------------------------------------------------- diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh deleted file mode 100755 index 79898e4..0000000 --- a/bin/compute-classpath.sh +++ /dev/null @@ -1,186 +0,0 @@ -#!/usr/bin/env bash - -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# This script computes Spark's classpath and prints it to stdout; it's used by both the "run" -# script and the ExecutorRunner in standalone cluster mode. - -# Figure out where Spark is installed -#FWDIR="$(cd "`dirname "$0"`"/..; pwd)" -FWDIR="$SPARK_HOME" - -#. "$FWDIR"/bin/load-spark-env.sh # not executable by defult in $SPARK_HOME/bin - -"$MAHOUT_HOME"/bin/mahout-load-spark-env.sh - -# compute the Scala version Note: though Mahout has not bee tested with Scala 2.11 -# Setting SPARK_SCALA_VERSION if not already set. - -if [ -z "$SPARK_SCALA_VERSION" ]; then - - ASSEMBLY_DIR2="$FWDIR/assembly/target/scala-2.11" - ASSEMBLY_DIR1="$FWDIR/assembly/target/scala-2.10" - - if [[ -d "$ASSEMBLY_DIR2" && -d "$ASSEMBLY_DIR1" ]]; then - echo -e "Presence of build for both scala versions(SCALA 2.10 and SCALA 2.11) detected." 1>&2 - echo -e 'Either clean one of them or, export SPARK_SCALA_VERSION=2.11 in spark-env.sh.' 1>&2 - exit 1 - fi - - if [ -d "$ASSEMBLY_DIR2" ]; then - export SPARK_SCALA_VERSION="2.11" - else - export SPARK_SCALA_VERSION="2.10" - fi -fi - - -function appendToClasspath(){ - if [ -n "$1" ]; then - if [ -n "$CLASSPATH" ]; then - CLASSPATH="$CLASSPATH:$1" - else - CLASSPATH="$1" - fi - fi -} - -appendToClasspath "$SPARK_CLASSPATH" -appendToClasspath "$SPARK_SUBMIT_CLASSPATH" - -# Build up classpath -if [ -n "$SPARK_CONF_DIR" ]; then - appendToClasspath "$SPARK_CONF_DIR" -else - appendToClasspath "$FWDIR/conf" -fi - -ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SPARK_SCALA_VERSION" - -if [ -n "$JAVA_HOME" ]; then - JAR_CMD="$JAVA_HOME/bin/jar" -else - JAR_CMD="jar" -fi - -# A developer option to prepend more recently compiled Spark classes -if [ -n "$SPARK_PREPEND_CLASSES" ]; then - echo "NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark"\ - "classes ahead of assembly." >&2 - # Spark classes - appendToClasspath "$FWDIR/core/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/repl/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/mllib/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/bagel/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/graphx/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/streaming/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/sql/catalyst/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/sql/core/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/sql/hive/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/sql/hive-thriftserver/target/scala-$SPARK_SCALA_VERSION/classes" - appendToClasspath "$FWDIR/yarn/stable/target/scala-$SPARK_SCALA_VERSION/classes" - # Jars for shaded deps in their original form (copied here during build) - appendToClasspath "$FWDIR/core/target/jars/*" -fi - -# Use spark-assembly jar from either RELEASE or assembly directory -if [ -f "$FWDIR/RELEASE" ]; then - assembly_folder="$FWDIR"/lib -else - assembly_folder="$ASSEMBLY_DIR" -fi - -num_jars=0 - -for f in "${assembly_folder}"/spark-assembly*hadoop*.jar; do - if [[ ! -e "$f" ]]; then - echo "Failed to find Spark assembly in $assembly_folder" 1>&2 - echo "You need to build Spark before running this program." 1>&2 - exit 1 - fi - ASSEMBLY_JAR="$f" - num_jars=$((num_jars+1)) -done - -if [ "$num_jars" -gt "1" ]; then - echo "Found multiple Spark assembly jars in $assembly_folder:" 1>&2 - ls "${assembly_folder}"/spark-assembly*hadoop*.jar 1>&2 - echo "Please remove all but one jar." 1>&2 - exit 1 -fi - -# Only able to make this check if 'jar' command is available -if [ $(command -v "$JAR_CMD") ] ; then - # Verify that versions of java used to build the jars and run Spark are compatible - jar_error_check=$("$JAR_CMD" -tf "$ASSEMBLY_JAR" nonexistent/class/path 2>&1) - if [[ "$jar_error_check" =~ "invalid CEN header" ]]; then - echo "Loading Spark jar with '$JAR_CMD' failed. " 1>&2 - echo "This is likely because Spark was compiled with Java 7 and run " 1>&2 - echo "with Java 6. (see SPARK-1703). Please use Java 7 to run Spark " 1>&2 - echo "or build Spark with Java 6." 1>&2 - exit 1 - fi -fi - -appendToClasspath "$ASSEMBLY_JAR" - -# When Hive support is needed, Datanucleus jars must be included on the classpath. -# Datanucleus jars do not work if only included in the uber jar as plugin.xml metadata is lost. -# Both sbt and maven will populate "lib_managed/jars/" with the datanucleus jars when Spark is -# built with Hive, so first check if the datanucleus jars exist, and then ensure the current Spark -# assembly is built for Hive, before actually populating the CLASSPATH with the jars. -# Note that this check order is faster (by up to half a second) in the case where Hive is not used. -if [ -f "$FWDIR/RELEASE" ]; then - datanucleus_dir="$FWDIR"/lib -else - datanucleus_dir="$FWDIR"/lib_managed/jars -fi - -datanucleus_jars="$(find "$datanucleus_dir" 2>/dev/null | grep "datanucleus-.*\\.jar$")" -datanucleus_jars="$(echo "$datanucleus_jars" | tr "\n" : | sed s/:$//g)" - -if [ -n "$datanucleus_jars" ]; then - appendToClasspath "$datanucleus_jars" -fi - -# Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1 -if [[ $SPARK_TESTING == 1 ]]; then - appendToClasspath "$FWDIR/core/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/repl/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/mllib/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/bagel/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/graphx/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/streaming/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/sql/catalyst/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/sql/core/target/scala-$SPARK_SCALA_VERSION/test-classes" - appendToClasspath "$FWDIR/sql/hive/target/scala-$SPARK_SCALA_VERSION/test-classes" -fi - -# Add hadoop conf dir if given -- otherwise FileSystem.*, etc fail ! -# Note, this assumes that there is either a HADOOP_CONF_DIR or YARN_CONF_DIR which hosts -# the configurtion files. -appendToClasspath "$HADOOP_CONF_DIR" -appendToClasspath "$YARN_CONF_DIR" - -# To allow for distributions to append needed libraries to the classpath (e.g. when -# using the "hadoop-provided" profile to build Spark), check SPARK_DIST_CLASSPATH and -# append it to tbe final classpath. -appendToClasspath "$SPARK_DIST_CLASSPATH" - -echo "$CLASSPATH" http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/load-shell.scala ---------------------------------------------------------------------- diff --git a/bin/load-shell.scala b/bin/load-shell.scala deleted file mode 100644 index 7468b76..0000000 --- a/bin/load-shell.scala +++ /dev/null @@ -1,34 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -import org.apache.mahout.math._ -import org.apache.mahout.math.scalabindings._ -import org.apache.mahout.math.drm._ -import org.apache.mahout.math.scalabindings.RLikeOps._ -import org.apache.mahout.math.drm.RLikeDrmOps._ -import org.apache.mahout.sparkbindings._ - -implicit val sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = sc2sdc(sc) - -println(""" - _ _ -_ __ ___ __ _| |__ ___ _ _| |_ - '_ ` _ \ / _` | '_ \ / _ \| | | | __| - | | | | | (_| | | | | (_) | |_| | |_ -_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0 - -""") \ No newline at end of file http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/mahout ---------------------------------------------------------------------- diff --git a/bin/mahout b/bin/mahout deleted file mode 100755 index 3017c9e..0000000 --- a/bin/mahout +++ /dev/null @@ -1,395 +0,0 @@ -#!/bin/bash -# -# The Mahout command script -# -# Environment Variables -# -# MAHOUT_JAVA_HOME The java implementation to use. Overrides JAVA_HOME. -# -# MAHOUT_HEAPSIZE The maximum amount of heap to use, in MB. -# Default is 4000. -# -# HADOOP_CONF_DIR The location of a hadoop config directory -# -# MAHOUT_OPTS Extra Java runtime options. -# -# MAHOUT_CONF_DIR The location of the program short-name to class name -# mappings and the default properties files -# defaults to "$MAHOUT_HOME/src/conf" -# -# MAHOUT_LOCAL set to anything other than an empty string to force -# mahout to run locally even if -# HADOOP_CONF_DIR and HADOOP_HOME are set -# -# MAHOUT_CORE set to anything other than an empty string to force -# mahout to run in developer 'core' mode, just as if the -# -core option was presented on the command-line -# Command-line Options -# -# -core -core is used to switch into 'developer mode' when -# running mahout locally. If specified, the classes -# from the 'target/classes' directories in each project -# are used. Otherwise classes will be retrieved from -# jars in the binary release collection or *-job.jar files -# found in build directories. When running on hadoop -# the job files will always be used. - -# -#/** -# * Licensed to the Apache Software Foundation (ASF) under one or more -# * contributor license agreements. See the NOTICE file distributed with -# * this work for additional information regarding copyright ownership. -# * The ASF licenses this file to You under the Apache License, Version 2.0 -# * (the "License"); you may not use this file except in compliance with -# * the License. You may obtain a copy of the License at -# * -# * http://www.apache.org/licenses/LICENSE-2.0 -# * -# * Unless required by applicable law or agreed to in writing, software -# * distributed under the License is distributed on an "AS IS" BASIS, -# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# * See the License for the specific language governing permissions and -# * limitations under the License. -# */ - -cygwin=false -case "`uname`" in -CYGWIN*) cygwin=true;; -esac - -# resolve links - $0 may be a softlink -THIS="$0" -while [ -h "$THIS" ]; do - ls=`ls -ld "$THIS"` - link=`expr "$ls" : '.*-> \(.*\)$'` - if expr "$link" : '.*/.*' > /dev/null; then - THIS="$link" - else - THIS=`dirname "$THIS"`/"$link" - fi -done - -IS_CORE=0 -if [ "$1" == "-core" ] ; then - IS_CORE=1 - shift -fi - -if [ "$1" == "-spark" ]; then - SPARK=1 - shift -fi - -if [ "$1" == "spark-shell" ]; then - SPARK=1 -fi - -if [ "$1" == "spark-itemsimilarity" ]; then - SPARK=1 -fi - -if [ "$1" == "spark-rowsimilarity" ]; then - SPARK=1 -fi - -if [ "$1" == "spark-trainnb" ]; then - SPARK=1 -fi - -if [ "$1" == "spark-testnb" ]; then - SPARK=1 -fi - -if [ "$MAHOUT_CORE" != "" ]; then - IS_CORE=1 -fi - -if [ "$1" == "h2o-node" ]; then - H2O=1 -fi - -# some directories -THIS_DIR=`dirname "$THIS"` -MAHOUT_HOME=`cd "$THIS_DIR/.." ; pwd` - -# some Java parameters -if [ "$MAHOUT_JAVA_HOME" != "" ]; then - #echo "run java in $MAHOUT_JAVA_HOME" - JAVA_HOME=$MAHOUT_JAVA_HOME -fi - -if [ "$JAVA_HOME" = "" ]; then - echo "Error: JAVA_HOME is not set." - exit 1 -fi - -JAVA=$JAVA_HOME/bin/java -JAVA_HEAP_MAX=-Xmx4g - -# check envvars which might override default args -if [ "$MAHOUT_HEAPSIZE" != "" ]; then - #echo "run with heapsize $MAHOUT_HEAPSIZE" - JAVA_HEAP_MAX="-Xmx""$MAHOUT_HEAPSIZE""m" - #echo $JAVA_HEAP_MAX -fi - -if [ "x$MAHOUT_CONF_DIR" = "x" ]; then - if [ -d $MAHOUT_HOME/src/conf ]; then - MAHOUT_CONF_DIR=$MAHOUT_HOME/src/conf - else - if [ -d $MAHOUT_HOME/conf ]; then - MAHOUT_CONF_DIR=$MAHOUT_HOME/conf - else - echo No MAHOUT_CONF_DIR found - fi - fi -fi - - -# CLASSPATH initially contains $MAHOUT_CONF_DIR, or defaults to $MAHOUT_HOME/src/conf -CLASSPATH=${CLASSPATH}:$MAHOUT_CONF_DIR - -if [ "$MAHOUT_LOCAL" != "" ]; then - echo "MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath." -elif [ -n "$HADOOP_CONF_DIR" ] ; then - echo "MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath." - CLASSPATH=${CLASSPATH}:$HADOOP_CONF_DIR -fi - -CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar - -# so that filenames w/ spaces are handled correctly in loops below -IFS= - -if [ $IS_CORE == 0 ] -then - # add release dependencies to CLASSPATH - for f in $MAHOUT_HOME/mahout-*.jar; do - CLASSPATH=${CLASSPATH}:$f; - done - - if [ "$SPARK" != "1" ]; then - - # add dev targets if they exist - for f in $MAHOUT_HOME/examples/target/mahout-examples-*-job.jar $MAHOUT_HOME/mahout-examples-*-job.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - fi - - # add scala dev target - for f in $MAHOUT_HOME/math-scala/target/mahout-math-scala_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - if [ "$H2O" == "1" ]; then - for f in $MAHOUT_HOME/hdfs/target/mahout-hdfs-*.jar; do - CLASSPATH=${CLASSPATH}:$f; - done - - for f in $MAHOUT_HOME/h2o/target/mahout-h2o*.jar; do - CLASSPATH=${CLASSPATH}:$f; - done - - fi - - # add jars for running from the command line if we requested shell or spark CLI driver - if [ "$SPARK" == "1" ]; then - - for f in $MAHOUT_HOME/hdfs/target/mahout-hdfs-*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - for f in $MAHOUT_HOME/math/target/mahout-math-*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - for f in $MAHOUT_HOME/spark/target/mahout-spark_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - for f in $MAHOUT_HOME/spark-shell/target/mahout-spark-shell_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - # viennacl jars- may or may not be available depending on build profile - for f in $MAHOUT_HOME/viennacl/target/mahout-native-viennacl_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - # viennacl jars- may or may not be available depending on build profile - for f in $MAHOUT_HOME/viennacl-omp/target/mahout-native-viennacl-omp_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - # viennacl jars- may or may not be available depending on build profile - for f in $MAHOUT_HOME/viennacl-omp/target/mahout-native-viennacl-omp_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - SPARK_CP_BIN="${MAHOUT_HOME}/bin/compute-classpath.sh" - if [ -x "${SPARK_CP_BIN}" ]; then - SPARK_CLASSPATH=$("${SPARK_CP_BIN}" 2>/dev/null) - CLASSPATH="${CLASSPATH}:${SPARK_CLASSPATH}" - else - echo "Cannot find Spark classpath. Is 'SPARK_HOME' set?" - exit -1 - fi - - SPARK_ASSEMBLY_BIN="${MAHOUT_HOME}/bin/mahout-spark-class.sh" - if [ -x "${SPARK_ASSEMBLY_BIN}" ]; then - SPARK_ASSEMBLY_CLASSPATH=$("${SPARK_ASSEMBLY_BIN}" 2>/dev/null) - CLASSPATH="${CLASSPATH}:${SPARK_ASSEMBLY_BIN}" - else - echo "Cannot find Spark assembly classpath. Is 'SPARK_HOME' set?" - exit -1 - fi - fi - - # add vcl jars at any point. - # viennacl jars- may or may not be available depending on build profile - for f in $MAHOUT_HOME/viennacl/target/mahout-native-viennacl_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - # viennacl jars- may or may not be available depending on build profile - for f in $MAHOUT_HOME/viennacl-omp/target/mahout-native-viennacl-omp_*.jar ; do - CLASSPATH=${CLASSPATH}:$f; - done - - # add release dependencies to CLASSPATH - for f in $MAHOUT_HOME/lib/*.jar; do - CLASSPATH=${CLASSPATH}:$f; - done -else - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/math/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/hdfs/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/mr/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/integration/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/examples/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/math-scala/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/spark/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/spark-shell/target/classes - CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/h2o/target/classes -fi - -# add development dependencies to CLASSPATH -if [ "$SPARK" != "1" ]; then - for f in $MAHOUT_HOME/examples/target/dependency/*.jar; do - CLASSPATH=${CLASSPATH}:$f; - done -fi - - -# cygwin path translation -if $cygwin; then - CLASSPATH=`cygpath -p -w "$CLASSPATH"` -fi - -# restore ordinary behaviour -unset IFS -JARS=$(echo "$MAHOUT_HOME"/*.jar | tr ' ' ',') -case "$1" in - (spark-shell) - save_stty=$(stty -g 2>/dev/null); - $SPARK_HOME/bin/spark-shell --jars "$JARS" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer $@ - stty sane; stty $save_stty - ;; - # Spark CLI drivers go here - (spark-itemsimilarity) - shift - "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" "org.apache.mahout.drivers.ItemSimilarityDriver" "$@" - ;; - (spark-rowsimilarity) - shift - "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" "org.apache.mahout.drivers.RowSimilarityDriver" "$@" - ;; - (spark-trainnb) - shift - "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" "org.apache.mahout.drivers.TrainNBDriver" "$@" - ;; - (spark-testnb) - shift - "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" "org.apache.mahout.drivers.TestNBDriver" "$@" - ;; - - (h2o-node) - shift - "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" "water.H2O" -md5skip "$@" -name mah2out - ;; - (*) - - # default log directory & file - if [ "$MAHOUT_LOG_DIR" = "" ]; then - MAHOUT_LOG_DIR="$MAHOUT_HOME/logs" - fi - if [ "$MAHOUT_LOGFILE" = "" ]; then - MAHOUT_LOGFILE='mahout.log' - fi - - #Fix log path under cygwin - if $cygwin; then - MAHOUT_LOG_DIR=`cygpath -p -w "$MAHOUT_LOG_DIR"` - fi - - MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR" - MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE" - - - if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then - MAHOUT_OPTS="$MAHOUT_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH" - fi - - CLASS=org.apache.mahout.driver.MahoutDriver - - for f in $MAHOUT_HOME/examples/target/mahout-examples-*-job.jar $MAHOUT_HOME/mahout-examples-*-job.jar ; do - if [ -e "$f" ]; then - MAHOUT_JOB=$f - fi - done - - # run it - - HADOOP_BINARY=$(PATH="${HADOOP_HOME:-${HADOOP_PREFIX}}/bin:$PATH" which hadoop 2>/dev/null) - if [ -x "$HADOOP_BINARY" ] ; then - HADOOP_BINARY_CLASSPATH=$("$HADOOP_BINARY" classpath) - fi - if [ ! -x "$HADOOP_BINARY" ] || [ "$MAHOUT_LOCAL" != "" ] ; then - if [ ! -x "$HADOOP_BINARY" ] ; then - echo "hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally" - elif [ "$MAHOUT_LOCAL" != "" ] ; then - echo "MAHOUT_LOCAL is set, running locally" - fi - CLASSPATH="${CLASSPATH}:${MAHOUT_HOME}/lib/hadoop/*" - case $1 in - (classpath) - echo $CLASSPATH - ;; - (*) - exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS "$@" - esac - else - echo "Running on hadoop, using $HADOOP_BINARY and HADOOP_CONF_DIR=$HADOOP_CONF_DIR" - - if [ "$MAHOUT_JOB" = "" ] ; then - echo "ERROR: Could not find mahout-examples-*.job in $MAHOUT_HOME or $MAHOUT_HOME/examples/target, please run 'mvn install' to create the .job file" - exit 1 - else - case "$1" in - (hadoop) - shift - export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}:$CLASSPATH - exec "$HADOOP_BINARY" "$@" - ;; - (classpath) - echo $CLASSPATH - ;; - (*) - echo "MAHOUT-JOB: $MAHOUT_JOB" - export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH} - exec "$HADOOP_BINARY" jar $MAHOUT_JOB $CLASS "$@" - esac - fi - fi - ;; -esac - http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/mahout-load-spark-env.sh ---------------------------------------------------------------------- diff --git a/bin/mahout-load-spark-env.sh b/bin/mahout-load-spark-env.sh deleted file mode 100755 index 533eecf..0000000 --- a/bin/mahout-load-spark-env.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/usr/bin/env bash - -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# This script loads spark-env.sh if it exists, and ensures it is only loaded once. -# spark-env.sh is loaded from SPARK_CONF_DIR if set, or within the current directory's -# conf/ subdirectory. -FWDIR="$SPARK_HOME" - -if [ -z "$SPARK_ENV_LOADED" ]; then - export SPARK_ENV_LOADED=1 - - # Returns the parent of the directory this script lives in. - parent_dir="$(cd "`dirname "$0"`"/..; pwd)" - - user_conf_dir="${SPARK_CONF_DIR:-"$parent_dir"/conf}" - - if [ -f "${user_conf_dir}/spark-env.sh" ]; then - # Promote all variable declarations to environment (exported) variables - set -a - . "${user_conf_dir}/spark-env.sh" - set +a - fi -fi - http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/mahout-spark-class.sh ---------------------------------------------------------------------- diff --git a/bin/mahout-spark-class.sh b/bin/mahout-spark-class.sh deleted file mode 100755 index ef88829..0000000 --- a/bin/mahout-spark-class.sh +++ /dev/null @@ -1,80 +0,0 @@ -#!/usr/bin/env bash - -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# Figure out where Spark is installed -#export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" - -#"$SPARK_HOME"/bin/load-spark-env.sh # not executable by defult in $SPARK_HOME/bin -"$MAHOUT_HOME"/bin/mahout-load-spark-env.sh - -# Find the java binary -if [ -n "${JAVA_HOME}" ]; then - RUNNER="${JAVA_HOME}/bin/java" -else - if [ `command -v java` ]; then - RUNNER="java" - else - echo "JAVA_HOME is not set" >&2 - exit 1 - fi -fi - -# Find assembly jar -SPARK_ASSEMBLY_JAR= -if [ -f "$SPARK_HOME/RELEASE" ]; then - ASSEMBLY_DIR="$SPARK_HOME/lib" -else - ASSEMBLY_DIR="$SPARK_HOME/assembly/target/scala-$SPARK_SCALA_VERSION" -fi - -num_jars="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" | wc -l)" -if [ "$num_jars" -eq "0" -a -z "$SPARK_ASSEMBLY_JAR" ]; then - echo "Failed to find Spark assembly in $ASSEMBLY_DIR." 1>&2 - echo "You need to build Spark before running this program." 1>&2 - exit 1 -fi -ASSEMBLY_JARS="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" || true)" -if [ "$num_jars" -gt "1" ]; then - echo "Found multiple Spark assembly jars in $ASSEMBLY_DIR:" 1>&2 - echo "$ASSEMBLY_JARS" 1>&2 - echo "Please remove all but one jar." 1>&2 - exit 1 -fi - -SPARK_ASSEMBLY_JAR="${ASSEMBLY_DIR}/${ASSEMBLY_JARS}" - -LAUNCH_CLASSPATH="$SPARK_ASSEMBLY_JAR" - -# Add the launcher build dir to the classpath if requested. -if [ -n "$SPARK_PREPEND_CLASSES" ]; then - LAUNCH_CLASSPATH="$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" -fi - -export _SPARK_ASSEMBLY="$SPARK_ASSEMBLY_JAR" - -echo $LAUNCH_CLASSPATH - -# The launcher library will print arguments separated by a NULL character, to allow arguments with -# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating -# an array that will be used to exec the final command. -#CMD=() -#while IFS= read -d '' -r ARG; do -# CMD+=("$ARG") -#done < <("$RUNNER" -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@") -#exec "${CMD[@]}" http://git-wip-us.apache.org/repos/asf/mahout/blob/99a5358f/bin/mahout.cmd ---------------------------------------------------------------------- diff --git a/bin/mahout.cmd b/bin/mahout.cmd deleted file mode 100644 index 86bae79..0000000 --- a/bin/mahout.cmd +++ /dev/null @@ -1,397 +0,0 @@ -@echo off - -echo "===============DEPRECATION WARNING===============" -echo "This script is no longer supported for new drivers as of Mahout 0.10.0" -echo "Mahout's bash script is supported and if someone wants to contribute a fix for this" -echo "it would be appreciated." - - -@rem -@rem The Mahout command script -@rem -@rem Environment Variables -@rem -@rem MAHOUT_JAVA_HOME The java implementation to use. Overrides JAVA_HOME. -@rem -@rem MAHOUT_HEAPSIZE The maximum amount of heap to use, in MB. -@rem Default is 1000. -@rem -@rem HADOOP_CONF_DIR The location of a hadoop config directory -@rem -@rem MAHOUT_OPTS Extra Java runtime options. -@rem -@rem MAHOUT_CONF_DIR The location of the program short-name to class name -@rem mappings and the default properties files -@rem defaults to "$MAHOUT_HOME/src/conf" -@rem -@rem MAHOUT_LOCAL set to anything other than an empty string to force -@rem mahout to run locally even if -@rem HADOOP_CONF_DIR and HADOOP_HOME are set -@rem -@rem MAHOUT_CORE set to anything other than an empty string to force -@rem mahout to run in developer 'core' mode, just as if the -@rem -core option was presented on the command-line -@rem Commane-line Options -@rem -@rem -core -core is used to switch into 'developer mode' when -@rem running mahout locally. If specified, the classes -@rem from the 'target/classes' directories in each project -@rem are used. Otherwise classes will be retrived from -@rem jars in the binary releas collection or *-job.jar files -@rem found in build directories. When running on hadoop -@rem the job files will always be used. - -@rem -@rem /* -@rem * Licensed to the Apache Software Foundation (ASF) under one or more -@rem * contributor license agreements. See the NOTICE file distributed with -@rem * this work for additional information regarding copyright ownership. -@rem * The ASF licenses this file to You under the Apache License, Version 2.0 -@rem * (the "License"); you may not use this file except in compliance with -@rem * the License. You may obtain a copy of the License at -@rem * -@rem * http://www.apache.org/licenses/LICENSE-2.0 -@rem * -@rem * Unless required by applicable law or agreed to in writing, software -@rem * distributed under the License is distributed on an "AS IS" BASIS, -@rem * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -@rem * See the License for the specific language governing permissions and -@rem * limitations under the License. -@rem */ - -setlocal enabledelayedexpansion - -@rem disable "developer mode" -set IS_CORE=0 -if [%1] == [-core] ( - set IS_CORE=1 - shift -) - -if not [%MAHOUT_CORE%] == [] ( -set IS_CORE=1 -) - -if [%MAHOUT_HOME%] == [] set MAHOUT_HOME=%~dp0.. - -echo "Mahout home set %MAHOUT_HOME%" - -@rem some Java parameters -if not [%MAHOUT_JAVA_HOME%] == [] ( -@rem echo run java in %MAHOUT_JAVA_HOME% -set JAVA_HOME=%MAHOUT_JAVA_HOME% -) - -if [%JAVA_HOME%] == [] ( - echo Error: JAVA_HOME is not set. - exit /B 1 -) - -set JAVA=%JAVA_HOME%\bin\java -set JAVA_HEAP_MAX=-Xmx3g - -@rem check envvars which might override default args -if not [%MAHOUT_HEAPSIZE%] == [] ( -@rem echo run with heapsize %MAHOUT_HEAPSIZE% -set JAVA_HEAP_MAX=-Xmx%MAHOUT_HEAPSIZE%m -@rem echo %JAVA_HEAP_MAX% -) - -if [%MAHOUT_CONF_DIR%] == [] ( -set MAHOUT_CONF_DIR=%MAHOUT_HOME%\conf -) - -:main -@rem MAHOUT_CLASSPATH initially contains $MAHOUT_CONF_DIR, or defaults to $MAHOUT_HOME/src/conf -set CLASSPATH=%CLASSPATH%;%MAHOUT_CONF_DIR% - -if not [%MAHOUT_LOCAL%] == [] ( -echo "MAHOUT_LOCAL is set, so we do not add HADOOP_CONF_DIR to classpath." -) else ( -if not [%HADOOP_CONF_DIR%] == [] ( -echo "MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath." -set CLASSPATH=%CLASSPATH%;%HADOOP_CONF_DIR% -) -) - -set CLASSPATH=%CLASSPATH%;%JAVA_HOME%\lib\tools.jar - -if %IS_CORE% == 0 ( -@rem add release dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\mahout-*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -@rem add dev targets if they exist -for %%f in (%MAHOUT_HOME%\examples\target\mahout-examples-*-job.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -for %%f in (%MAHOUT_HOME%\mahout-examples-*-job.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -@rem add release dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\lib\*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -) else ( -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\math\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\core\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\integration\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\examples\target\classes -@rem set CLASSPATH=%CLASSPATH%;%MAHOUT_HOME%\core\src\main\resources -) - -@rem add development dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\examples\target\dependency\*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) - -@rem default log directory & file -if [%MAHOUT_LOG_DIR%] == [] ( -set MAHOUT_LOG_DIR=%MAHOUT_HOME%\logs -) -if [%MAHOUT_LOGFILE%] == [] ( -set MAHOUT_LOGFILE=mahout.log -) - -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dhadoop.log.dir=%MAHOUT_LOG_DIR% -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dhadoop.log.file=%MAHOUT_LOGFILE% - -if not [%JAVA_LIBRARY_PATH%] == [] ( -set MAHOUT_OPTS=%MAHOUT_OPTS% -Djava.library.path=%JAVA_LIBRARY_PATH% -) - -set CLASS=org.apache.mahout.driver.MahoutDriver - -for %%f in (%MAHOUT_HOME%\examples\target\mahout-examples-*-job.jar) do ( -set MAHOUT_JOB=%%f -) - -@rem run it - -if not [%MAHOUT_LOCAL%] == [] ( - echo "MAHOUT_LOCAL is set, running locally" - %JAVA% %JAVA_HEAP_MAX% %MAHOUT_OPTS% -classpath %MAHOUT_CLASSPATH% %CLASS% %* -) else ( - if [%MAHOUT_JOB%] == [] ( - echo "ERROR: Could not find mahout-examples-*.job in %MAHOUT_HOME% or %MAHOUT_HOME%/examples/target, please run 'mvn install' to create the .job file" - exit /B 1 - ) else ( - set HADOOP_CLASSPATH=%MAHOUT_CLASSPATH% - if /i [%1] == [hadoop] ( -shift -set HADOOP_CLASSPATH=%MAHOUT_CONF_DIR%;%HADOOP_CLASSPATH% - call %HADOOP_HOME%\bin\%* - ) else ( -if /i [%1] == [classpath] ( -echo %CLASSPATH% -) else ( -echo MAHOUT_JOB: %MAHOUT_JOB% -set HADOOP_CLASSPATH=%MAHOUT_CONF_DIR%;%HADOOP_CLASSPATH% -set HADOOP_CLIENT_OPTS=%JAVA_HEAP_MAX% -call %HADOOP_HOME%\bin\hadoop jar %MAHOUT_JOB% %CLASS% %* -) - - ) - ) -) -@echo off - -@rem -@rem The Mahout command script -@rem -@rem Environment Variables -@rem -@rem MAHOUT_JAVA_HOME The java implementation to use. Overrides JAVA_HOME. -@rem -@rem MAHOUT_HEAPSIZE The maximum amount of heap to use, in MB. -@rem Default is 1000. -@rem -@rem HADOOP_CONF_DIR The location of a hadoop config directory -@rem -@rem MAHOUT_OPTS Extra Java runtime options. -@rem -@rem MAHOUT_CONF_DIR The location of the program short-name to class name -@rem mappings and the default properties files -@rem defaults to "$MAHOUT_HOME/src/conf" -@rem -@rem MAHOUT_LOCAL set to anything other than an empty string to force -@rem mahout to run locally even if -@rem HADOOP_CONF_DIR and HADOOP_HOME are set -@rem -@rem MAHOUT_CORE set to anything other than an empty string to force -@rem mahout to run in developer 'core' mode, just as if the -@rem -core option was presented on the command-line -@rem Commane-line Options -@rem -@rem -core -core is used to switch into 'developer mode' when -@rem running mahout locally. If specified, the classes -@rem from the 'target/classes' directories in each project -@rem are used. Otherwise classes will be retrived from -@rem jars in the binary releas collection or *-job.jar files -@rem found in build directories. When running on hadoop -@rem the job files will always be used. - -@rem -@rem /* -@rem * Licensed to the Apache Software Foundation (ASF) under one or more -@rem * contributor license agreements. See the NOTICE file distributed with -@rem * this work for additional information regarding copyright ownership. -@rem * The ASF licenses this file to You under the Apache License, Version 2.0 -@rem * (the "License"); you may not use this file except in compliance with -@rem * the License. You may obtain a copy of the License at -@rem * -@rem * http://www.apache.org/licenses/LICENSE-2.0 -@rem * -@rem * Unless required by applicable law or agreed to in writing, software -@rem * distributed under the License is distributed on an "AS IS" BASIS, -@rem * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -@rem * See the License for the specific language governing permissions and -@rem * limitations under the License. -@rem */ - -setlocal enabledelayedexpansion - -@rem disable "developer mode" -set IS_CORE=0 -if [%1] == [-core] ( - set IS_CORE=1 - shift -) - -if not [%MAHOUT_CORE%] == [] ( -set IS_CORE=1 -) - -if [%MAHOUT_HOME%] == [] set MAHOUT_HOME=%~dp0.. - -echo "Mahout home set %MAHOUT_HOME%" - -@rem some Java parameters -if not [%MAHOUT_JAVA_HOME%] == [] ( -@rem echo run java in %MAHOUT_JAVA_HOME% -set JAVA_HOME=%MAHOUT_JAVA_HOME% -) - -if [%JAVA_HOME%] == [] ( - echo Error: JAVA_HOME is not set. - exit /B 1 -) - -set JAVA=%JAVA_HOME%\bin\java -set JAVA_HEAP_MAX=-Xmx3g - -@rem check envvars which might override default args -if not [%MAHOUT_HEAPSIZE%] == [] ( -@rem echo run with heapsize %MAHOUT_HEAPSIZE% -set JAVA_HEAP_MAX=-Xmx%MAHOUT_HEAPSIZE%m -@rem echo %JAVA_HEAP_MAX% -) - -if [%MAHOUT_CONF_DIR%] == [] ( -set MAHOUT_CONF_DIR=%MAHOUT_HOME%\conf -) - -:main -@rem MAHOUT_CLASSPATH initially contains $MAHOUT_CONF_DIR, or defaults to $MAHOUT_HOME/src/conf -set CLASSPATH=%CLASSPATH%;%MAHOUT_CONF_DIR% - -if not [%MAHOUT_LOCAL%] == [] ( -echo "MAHOUT_LOCAL is set, so we do not add HADOOP_CONF_DIR to classpath." -) else ( -if not [%HADOOP_CONF_DIR%] == [] ( -echo "MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath." -set CLASSPATH=%CLASSPATH%;%HADOOP_CONF_DIR% -) -) - -set CLASSPATH=%CLASSPATH%;%JAVA_HOME%\lib\tools.jar - -if %IS_CORE% == 0 ( -@rem add release dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\mahout-*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -@rem add dev targets if they exist -for %%f in (%MAHOUT_HOME%\examples\target\mahout-examples-*-job.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -for %%f in (%MAHOUT_HOME%\mahout-examples-*-job.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -@rem add release dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\lib\*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) -) else ( -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\math\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\core\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\integration\target\classes -set CLASSPATH=!CLASSPATH!;%MAHOUT_HOME%\examples\target\classes -@rem set CLASSPATH=%CLASSPATH%;%MAHOUT_HOME%\core\src\main\resources -) - -@rem add development dependencies to CLASSPATH -for %%f in (%MAHOUT_HOME%\examples\target\dependency\*.jar) do ( -set CLASSPATH=!CLASSPATH!;%%f -) - -@rem default log directory & file -if [%MAHOUT_LOG_DIR%] == [] ( -set MAHOUT_LOG_DIR=%MAHOUT_HOME%\logs -) -if [%MAHOUT_LOGFILE%] == [] ( -set MAHOUT_LOGFILE=mahout.log -) - -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dhadoop.log.dir=%MAHOUT_LOG_DIR% -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dhadoop.log.file=%MAHOUT_LOGFILE% -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.min.split.size=512MB -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.map.child.java.opts=-Xmx4096m -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.reduce.child.java.opts=-Xmx4096m -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.output.compress=true -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.compress.map.output=true -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.map.tasks=1 -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dmapred.reduce.tasks=1 -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dio.sort.factor=30 -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dio.sort.mb=1024 -set MAHOUT_OPTS=%MAHOUT_OPTS% -Dio.file.buffer.size=32786 -set HADOOP_OPTS=%HADOOP_OPTS% -Djava.library.path=%HADOOP_HOME%\bin - -if not [%JAVA_LIBRARY_PATH%] == [] ( -set MAHOUT_OPTS=%MAHOUT_OPTS% -Djava.library.path=%JAVA_LIBRARY_PATH% -) - -set CLASS=org.apache.mahout.driver.MahoutDriver - -for %%f in (%MAHOUT_HOME%\examples\target\mahout-examples-*-job.jar) do ( -set MAHOUT_JOB=%%f -) - -@rem run it - -if not [%MAHOUT_LOCAL%] == [] ( - echo "MAHOUT_LOCAL is set, running locally" - %JAVA% %JAVA_HEAP_MAX% %MAHOUT_OPTS% -classpath %MAHOUT_CLASSPATH% %CLASS% %* -) else ( - if [%MAHOUT_JOB%] == [] ( - echo "ERROR: Could not find mahout-examples-*.job in %MAHOUT_HOME% or %MAHOUT_HOME%/examples/target, please run 'mvn install' to create the .job file" - exit /B 1 - ) else ( - set HADOOP_CLASSPATH=%MAHOUT_CLASSPATH% - if /i [%1] == [hadoop] ( -shift -set HADOOP_CLASSPATH=%MAHOUT_CONF_DIR%;%HADOOP_CLASSPATH% - call %HADOOP_HOME%\bin\%* - ) else ( -if /i [%1] == [classpath] ( -echo %CLASSPATH% -) else ( -echo MAHOUT_JOB: %MAHOUT_JOB% -set HADOOP_CLASSPATH=%MAHOUT_CONF_DIR%;%HADOOP_CLASSPATH% -set HADOOP_CLIENT_OPTS=%JAVA_HEAP_MAX% -call %HADOOP_HOME%\bin\hadoop jar %MAHOUT_JOB% %CLASS% %* -) - - ) - ) -)
