Repository: mahout Updated Branches: refs/heads/master e59101243 -> fe77fc19f
[WEBSITE] Move BuildingMahout.md Project: http://git-wip-us.apache.org/repos/asf/mahout/repo Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/fe77fc19 Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/fe77fc19 Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/fe77fc19 Branch: refs/heads/master Commit: fe77fc19fc0c4d0c05c55a30a473acc71e30f1de Parents: e591012 Author: Trevor a.k.a @rawkintrevo <[email protected]> Authored: Wed Nov 29 13:25:14 2017 -0600 Committer: Trevor a.k.a @rawkintrevo <[email protected]> Committed: Wed Nov 29 13:25:14 2017 -0600 ---------------------------------------------------------------------- website/build_site.sh | 5 + website/oldsite/developers/buildingmahout.md | 187 ++++++++++++++++++---- 2 files changed, 164 insertions(+), 28 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mahout/blob/fe77fc19/website/build_site.sh ---------------------------------------------------------------------- diff --git a/website/build_site.sh b/website/build_site.sh old mode 100755 new mode 100644 index 0a66962..d8502d8 --- a/website/build_site.sh +++ b/website/build_site.sh @@ -19,6 +19,7 @@ export PATH=${GEM_HOME}/bin:$PATH (cd docs && bundle) (cd docs && bundle exec jekyll build --destination $WORKDIR/docs/latest) + # Set env for docs MAHOUT_VERSION=0.13.0 DISTFILE=apache-mahout-distribution-$MAHOUT_VERSION.tar.gz @@ -37,4 +38,8 @@ rm -rf * cp -a $WORKDIR/* . git add . git commit -m "Automatic Site Publish by Buildbot" +<<<<<<< HEAD +git push origin asf-site +======= git push origin asf-site +>>>>>>> e591012439c04e98d669ef9732fde865a9ef76fa http://git-wip-us.apache.org/repos/asf/mahout/blob/fe77fc19/website/oldsite/developers/buildingmahout.md ---------------------------------------------------------------------- diff --git a/website/oldsite/developers/buildingmahout.md b/website/oldsite/developers/buildingmahout.md index 8e1e7f0..40b509b 100644 --- a/website/oldsite/developers/buildingmahout.md +++ b/website/oldsite/developers/buildingmahout.md @@ -1,16 +1,17 @@ --- layout: default -title: BuildingMahout -theme: - name: retro-mahout +title: Building Mahout +theme: + name: mahout2 --- -# Building Mahout from source + +# Building Mahout from Source ## Prerequisites * Java JDK 1.7 -* Apache Maven 3.3.3 +* Apache Maven 3.3.9 ## Getting the source code @@ -23,40 +24,170 @@ or git clone https://github.com/apache/mahout.git -##Hadoop version -Mahout code depends on hadoop-client artifact, with the default version 2.4.1. To build Mahout against to a -different hadoop version, hadoop.version property should be set accordingly and passed to the build command. -Hadoop1 clients would additionally require hadoop1 profile to be activated. +## Building From Source + +###### Prerequisites: + +Linux Environment (preferably Ubuntu 16.04.x) Note: Currently only the JVM-only build will work on a Mac. +gcc > 4.x +NVIDIA Card (installed with OpenCL drivers alongside usual GPU drivers) + +###### Downloads + +Install java 1.7+ in an easily accessible directory (for this example, ~/java/) +http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html + +Create a directory ~/apache/ . + +Download apache Maven 3.3.9 and un-tar/gunzip to ~/apache/apache-maven-3.3.9/ . +https://maven.apache.org/download.cgi + +Download and un-tar/gunzip Hadoop 2.4.1 to ~/apache/hadoop-2.4.1/ . +https://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/ + +Download and un-tar/gunzip spark-1.6.3-bin-hadoop2.4 to ~/apache/ . +http://spark.apache.org/downloads.html +Choose release: Spark-1.6.3 (Nov 07 2016) +Choose package type: Pre-Built for Hadoop 2.4 + +Install ViennaCL 1.7.0+ +If running Ubuntu 16.04+ + +``` +sudo apt-get install libviennacl-dev +``` + +Otherwise if your distributionâs package manager does not have a viennniacl-dev package >1.7.0, clone it directly into the directory which will be included in when being compiled by Mahout: + +``` +mkdir ~/tmp +cd ~/tmp && git clone https://github.com/viennacl/viennacl-dev.git +cp -r viennacl/ /usr/local/ +cp -r CL/ /usr/local/ +``` + +Ensure that the OpenCL 1.2+ drivers are installed (packed with most consumer grade NVIDIA drivers). Not sure about higher end cards. + +Clone mahout repository into `~/apache`. + +``` +git clone https://github.com/apache/mahout.git +``` + +###### Configuration + +When building mahout for a spark backend, we need four System Environment variables set: +``` + export MAHOUT_HOME=/home/<user>/apache/mahout + export HADOOP_HOME=/home/<user>/apache/hadoop-2.4.1 + export SPARK_HOME=/home/<user>/apache/spark-1.6.3-bin-hadoop2.4 + export JAVA_HOME=/home/<user>/java/jdk-1.8.121 +``` + +Mahout on Spark regularly uses one more env variable, the IP of the Spark clusterâs master node (usually the node which one would be logged into). + +To use 4 local cores (Spark master need not be running) +``` +export MASTER=local[4] +``` +To use all available local cores (again, Spark master need not be running) +``` +export MASTER=local[*] +``` +To point to a cluster with spark running: +``` +export MASTER=spark://master.ip.address:7077 +``` + +We then add these to the path: + +``` + PATH=$PATH$:MAHOUT_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$JAVA_HOME/bin +``` + +These should be added to the your ~/.bashrc file. + + +###### Building Mahout with Apache Maven + +From the $MAHOUT_HOME directory we may issue the commands to build each using mvn profiles. + +JVM only: +``` +mvn clean install -DskipTests +``` + +JVM with native OpenMP level 2 and level 3 matrix/vector Multiplication +``` +mvn clean install -Pviennacl-omp -Phadoop2 -DskipTests +``` +JVM with native OpenMP and OpenCL for Level 2 and level 3 matrix/vector Multiplication. (GPU errors fall back to OpenMP, currently only a single GPU/node is supported). +``` +mvn clean install -Pviennacl -Phadoop2 -DskipTests +``` + +### Changing Scala Version + +To change the Scala version used it is possible to use profiles, however the resulting artifacts seem to have trouble being resolved with SBT. + +```bash +mvn clean install -Pscala-2.11 +``` + +Maven is able to resolve the resulting artifacts effectively, this will also work if the goal is simply to use the Mahout-Shell. However if the goal is to build with SBT, the following tool should be used + +```bash +cd $MAHOUT_HOME/buildtools +./change-scala-version.sh 2.11 +``` + +Now go back to `$MAHOUT_HOME` and execute + +```bash +mvn clean install -Pscala-2.11 +``` + +**NOTE:** you still need to pass the `-Pscala-2.11` profile, as this determines and propegates the minor scala version (e.g. 2.11.8) + + +### The Distribution Profile -The build lifecycle is illustrated below. +The distribution profile, among other things, will produce the same artifact for multiple Scala and Spark versions. -## Compiling +Specifically, in addition to creating all of the -Compile Mahout using standard maven commands +Default Targets: +- Spark 1.6 Bindings, Scala-2.10 +- Mahout-Math Scala-2.10 +- ViennaCL Scala-2.10* +- ViennaCL-OMP Scala-2.10* +- H2O Scala-2.10 - # With hadoop-2.4.1 dependency - mvn clean compile +It will also create: +- Spark 2.0 Bindings, Scala-2.11 +- Spark 2.1 Bindings, Scala-2.11 +- Mahout-Math Scala-2.11 +- ViennaCL Scala-2.11* +- ViennaCL-OMP Scala-2.11* +- H2O Scala-2.11 - # With hadoop-1.2.1 dependency - mvn -Phadoop1 -Dhadoop.version=1.2.1 clean compile +Note: * ViennaCLs are only created if the `viennacl` or `viennacl-omp` profiles are activated. -##Packaging +By default, this phase will execute the `package` lifecycle goal on all built "extra" varients. -Mahout has an extensive test suite which takes some time to run. If you just want to build Mahout, skip the tests like this +E.g. if you were to run - # With hadoop-2.4.1 dependency - mvn -DskipTests=true clean package +```bash +mvn clean install -Pdistribution +``` - # With hadoop-1.2.1 dependency - mvn -Phadoop1 -Dhadoop.version=1.2.1 -DskipTests=true clean package +You will `install` all of the "Default Targets" but only `package` the "Also created". +If you wish to `install` all of the above, you can set the `lifecycle.target` switch as follows: -In order to add mahout artifact to your local repository, run +```bash +mvn clean install -Pdistribution -Dlifecycle.target=install +``` - # With hadoop-2.4.1 dependency - mvn clean install - # With hadoop-1.2.1 dependency - mvn -Phadoop1 -Dhadoop.version=1.2.1 clean install - \ No newline at end of file
