http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/local-helloworld.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/local-helloworld.html.md b/docs/manual/obsolete/tutorials/enginebuilders/local-helloworld.html.md deleted file mode 100644 index 75a1f8e..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/local-helloworld.html.md +++ /dev/null @@ -1,546 +0,0 @@ ---- -title: Building the "HelloWorld" Engine ---- - -# Building the "HelloWorld" Engine - -This is a step-by-step guide on building your first predictive engine on -PredictionIO. The engine will use historical temperature data to predict the -temperature of a certain day in a week. - -> You need to build PredictionIO from source in order to build your own engine. -Please follow instructions to build from source -[here](/install/install-sourcecode.html). - -Completed source code can also be found at -`$PIO_HOME/examples/scala-local-helloworld` and -`$PIO_HOME/examples/java-local-helloworld`, where `$PIO_HOME` is the root -directory of the PredictionIO source code tree. - - -## Data Set - -This engine will read a historial daily temperatures as training data set. A very simple data set is prepared for you. - -First, create a directory somewhere and copy the data set over. Replace `path/to/data.csv` with your path which stores the training data. - -```console -$ cp $PIO_HOME/examples/data/helloworld/data1.csv path/to/data.csv -``` - -## 1. Create a new Engine - -```console -$ $PIO_HOME/bin/pio new HelloWorld -$ cd HelloWorld -``` - -A new engine project directory `HelloWorld` is created. You should see the following files being created inside this new project directory: - -``` -build.sbt -engine.json -params/ -project/ -src/ -``` - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> -You can find the Scala engine template in <code>src/main/scala/Engine.scala</code>. Please follow the instructions below to edit this file. - </div> - <div data-tab="Java" data-lang="java"> - -<strong>NOTE:</strong> -The template is created for Scala codes. For Java, need to do the following: - -Under <code>HelloWorld</code> directory: - -```bash -$ rm -rf src/main/scala -$ mkdir -p src/main/java -``` - - </div> -</div> - -## 2. Define Data Types - -### Define Training Data - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -class MyTrainingData( - val temperatures: List[(String, Double)] -) extends Serializable -``` - </div> - <div data-tab="Java" data-lang="java"> - -Create a new file <code>src/main/java/MyTrainingData.java</code>: - -```java -package myorg; - -import java.io.Serializable; -import java.util.List; - -public class MyTrainingData implements Serializable { - List<DayTemperature> temperatures; - - public MyTrainingData(List<DayTemperature> temperatures) { - this.temperatures = temperatures; - } - - public static class DayTemperature implements Serializable { - String day; - Double temperature; - - public DayTemperature(String day, Double temperature) { - this.day = day; - this.temperature = temperature; - } - } -} -``` - </div> -</div> - -### Define Query - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -class MyQuery( - val day: String -) extends Serializable -``` - </div> - <div data-tab="Java" data-lang="java"> - -Create a new file <code>src/main/java/MyQuery.java</code>: - -```java -package myorg; - -import java.io.Serializable; - -public class MyQuery implements Serializable { - String day; - - public MyQuery(String day) { - this.day = day; - } -} -``` - </div> -</div> - -### Define Model -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -import scala.collection.immutable.HashMap - -class MyModel( - val temperatures: HashMap[String, Double] -) extends Serializable { - override def toString = temperatures.toString -} - -``` - </div> - <div data-tab="Java" data-lang="java"> - -Create a new file <code>src/main/java/MyModel.java</code>: - -```java -package myorg; - -import java.io.Serializable; -import java.util.Map; - -public class MyModel implements Serializable { - Map<String, Double> temperatures; - - public MyModel(Map<String, Double> temperatures) { - this.temperatures = temperatures; - } - - @Override - public String toString() { - return temperatures.toString(); - } -} -``` - </div> -</div> - -### Define Predicted Result -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -class MyPredictedResult( - val temperature: Double -) extends Serializable -``` - </div> - <div data-tab="Java" data-lang="java"> - -Create a new file <code>src/main/java/MyPredictedResult.java</code>: - -```java -package myorg; - -import java.io.Serializable; - -public class MyPredictedResult implements Serializable { - Double temperature; - - public MyPredictedResult(Double temperature) { - this.temperature = temperature; - } -} -``` - </div> -</div> - -## 3. Implement the Data Source - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -import scala.io.Source - -class MyDataSource extends LDataSource[EmptyDataSourceParams, EmptyDataParams, - MyTrainingData, MyQuery, EmptyActualResult] { - - override def readTraining(): MyTrainingData = { - val lines = Source.fromFile("path/to/data.csv").getLines() - .toList.map { line => - val data = line.split(",") - (data(0), data(1).toDouble) - } - new MyTrainingData(lines) - } - -} -``` - </div> - <div data-tab="Java" data-lang="java"> - -Create a new file <code>src/main/java/MyDataSource.java</code>: - -```java -package myorg; - -import org.apache.predictionio.controller.java.*; - -import java.util.List; -import java.util.ArrayList; -import java.io.FileReader; -import java.io.BufferedReader; - -public class MyDataSource extends LJavaDataSource< - EmptyDataSourceParams, EmptyDataParams, MyTrainingData, MyQuery, EmptyActualResult> { - - @Override - public MyTrainingData readTraining() { - List<MyTrainingData.DayTemperature> temperatures = - new ArrayList<MyTrainingData.DayTemperature>(); - - try { - BufferedReader reader = - new BufferedReader(new FileReader("path/to/data.csv")); - String line; - while ((line = reader.readLine()) != null) { - String[] tokens = line.split(","); - temperatures.add( - new MyTrainingData.DayTemperature(tokens[0], - Double.parseDouble(tokens[1]))); - } - reader.close(); - } catch (Exception e) { - System.exit(1); - } - - return new MyTrainingData(temperatures); - } -} -``` - </div> -</div> - -**NOTE**: You need to update the `path/to/data.csv` in this code with the correct path that store the training data. - - -## 4. Implement an Algorithm - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -class MyAlgorithm extends LAlgorithm[EmptyAlgorithmParams, MyTrainingData, - MyModel, MyQuery, MyPredictedResult] { - - override - def train(pd: MyTrainingData): MyModel = { - // calculate average value of each day - val average = pd.temperatures - .groupBy(_._1) // group by day - .mapValues{ list => - val tempList = list.map(_._2) // get the temperature - tempList.sum / tempList.size - } - - // trait Map is not serializable, use concrete class HashMap - new MyModel(HashMap[String, Double]() ++ average) - } - - override - def predict(model: MyModel, query: MyQuery): MyPredictedResult = { - val temp = model.temperatures(query.day) - new MyPredictedResult(temp) - } -} -``` - </div> - <div data-tab="Java" data-lang="java"> -Create a new file <code>src/main/java/MyAlgorithm.java</code>: - -```java -package myorg; - -import org.apache.predictionio.controller.java.*; - -import java.util.Map; -import java.util.HashMap; - -public class MyAlgorithm extends LJavaAlgorithm< - EmptyAlgorithmParams, MyTrainingData, MyModel, MyQuery, MyPredictedResult> { - - @Override - public MyModel train(MyTrainingData data) { - Map<String, Double> sumMap = new HashMap<String, Double>(); - Map<String, Integer> countMap = new HashMap<String, Integer>(); - - // calculate sum and count for each day - for (MyTrainingData.DayTemperature temp : data.temperatures) { - Double sum = sumMap.get(temp.day); - Integer count = countMap.get(temp.day); - if (sum == null) { - sumMap.put(temp.day, temp.temperature); - countMap.put(temp.day, 1); - } else { - sumMap.put(temp.day, sum + temp.temperature); - countMap.put(temp.day, count + 1); - } - } - - // calculate the average - Map<String, Double> averageMap = new HashMap<String, Double>(); - for (Map.Entry<String, Double> entry : sumMap.entrySet()) { - String day = entry.getKey(); - Double average = entry.getValue() / countMap.get(day); - averageMap.put(day, average); - } - - return new MyModel(averageMap); - } - - @Override - public MyPredictedResult predict(MyModel model, MyQuery query) { - Double temp = model.temperatures.get(query.day); - return new MyPredictedResult(temp); - } -} -``` - </div> -</div> - -## 5. Implement EngineFactory - -<div class="tabs"> - <div data-tab="Scala" data-lang="scala"> - -Edit <code>src/main/scala/Engine.scala</code>: - -```scala -object MyEngineFactory extends IEngineFactory { - override - def apply() = { - /* SimpleEngine only requires one DataSouce and one Algorithm */ - new SimpleEngine( - classOf[MyDataSource], - classOf[MyAlgorithm] - ) - } -} -``` - </div> - <div data-tab="Java" data-lang="java"> -Create a new file <code>src/main/java/MyEngineFactory.java</code>: - -```java -package myorg; - -import org.apache.predictionio.controller.java.*; - -public class MyEngineFactory implements IJavaEngineFactory { - public JavaSimpleEngine<MyTrainingData, EmptyDataParams, MyQuery, MyPredictedResult, - EmptyActualResult> apply() { - - return new JavaSimpleEngineBuilder<MyTrainingData, EmptyDataParams, - MyQuery, MyPredictedResult, EmptyActualResult> () - .dataSourceClass(MyDataSource.class) - .preparatorClass() // Use default Preparator - .addAlgorithmClass("", MyAlgorithm.class) - .servingClass() // Use default Serving - .build(); - } -} - -``` - </div> -</div> - -## 6. Define engine.json - -You should see an engine.json created as follows: - -```json -{ - "id": "helloworld", - "version": "0.0.1-SNAPSHOT", - "name": "helloworld", - "engineFactory": "myorg.MyEngineFactory" -} -``` - -If you follow this Hello World Engine tutorial and didn't modify any of the class and package name (`myorg`). You don't need to update this file. - -## 7. Define Parameters - -You can safely delete the file `params/datasoruce.json` because this Hello World Engine doesn't take any parameters. - -``` -$ rm params/datasource.json -``` - -# Deploying the "HelloWorld" Engine Instance - -After the new engine is built, it is time to deploy an engine instance of it. - -## 1. Register engine: - -```bash -$ $PIO_HOME/bin/pio register -``` - -This command will compile the engine source code and build the necessary binary. - -## 2. Train: - -```bash -$ $PIO_HOME/bin/pio train -``` - -Example output: - -``` -2014-09-18 15:44:57,568 INFO spark.SparkContext - Job finished: collect at Workflow.scala:677, took 0.138356 s -2014-09-18 15:44:57,757 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: zdoo7SGAT2GVX8dMJFzT5w -``` - -This command produce an Engine Instance, which can be deployed. - -## 3. Deploy: - -```bash -$ $PIO_HOME/bin/pio deploy -``` - -You should see the following if the engine instance is deploy sucessfully: - -``` -INFO] [10/13/2014 18:11:09.721] [pio-server-akka.actor.default-dispatcher-4] [akka://pio-server/user/IO-HTTP/listener-0] Bound to localhost/127.0.0.1:8000 -[INFO] [10/13/2014 18:11:09.724] [pio-server-akka.actor.default-dispatcher-7] [akka://pio-server/user/master] Bind successful. Ready to serve. -``` - -Do not kill the deployed Engine Instance. You can retrieve the prediction by sending HTTP request to the engine instance. - -Open another terminal to execute the following: - -Retrieve temperature prediction for Monday: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Mon" }' http://localhost:8000/queries.json -``` - -You should see the following output: - -```json -{"temperature":75.5} -``` - -You can send another query to retrieve prediction. For example, retrieve temperature prediction for Tuesday: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Tue" }' http://localhost:8000/queries.json -``` - -You should see the following output: - -```json -{"temperature":80.5} -``` - -# Re-training The Engine - -Let's say you have collected more historial temperature data and want to re-train the Engine with updated data. You can simply execute `pio train` and `pio deploy` again. - -Another temperature data set is prepared for you. Run the following to update your data with this new data set. Replace the `path/to/data.csv` with your path used in the steps above. - -```bash -$ cp $PIO_HOME/examples/data/helloworld/data2.csv path/to/data.csv -``` - -In another terminal, go to the `HelloWorld` engine directory. Execute `pio train` and `deploy` again to deploy the latest instance trained with the new data. It would automatically kill the old running engine instance. - -```bash -$ $PIO_HOME/bin/pio train -$ $PIO_HOME/bin/pio deploy -``` - -Retrieve temperature prediction for Monday again: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Mon" }' http://localhost:8000/queries.json -``` - -You should see the following output: - -```json -{"temperature":76.66666666666667} -``` - -Check out [Java Parallel Helloworld tutorial](parallel-helloworld.html) -if you are interested how things are done on the parallel side.
http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/parallel-helloworld.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/parallel-helloworld.html.md b/docs/manual/obsolete/tutorials/enginebuilders/parallel-helloworld.html.md deleted file mode 100644 index cd30703..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/parallel-helloworld.html.md +++ /dev/null @@ -1,379 +0,0 @@ ---- -title: Building the Java Parallel "HelloWorld" Engine ---- - -# Building the Java Parallel "HelloWorld" Engine - -This is similar to the [HelloWorld](local-helloworld.html) engine tutorial. -The engine will use historical temperature data to predict the temperature of -a certain day in a week. We assume you have gone through tutorial for the -local version. - -> You need to build PredictionIO from source in order to build your own engine. -Please follow instructions to build from source -[here](/install/install-sourcecode.html). - -Completed source code can also be found at -`$PIO_HOME/examples/java-parallel-helloworld`, where `$PIO_HOME` is the root -directory of the PredictionIO source code tree. - -## 1. Define Data Types - -### Define Training Data - -Training data in this case is of type `JavaPairRDD<String, Float>` where -`String` holds the day and `Float` holds the temperature: - -<div class="tabs"> -<div data-tab="Java" data-lang="java"> -```java -import org.apache.spark.api.java.JavaPairRDD; - -JavaPairRDD<String, Float> readings; -``` - </div> -</div> - -### Define Prepared Data - -We convert the temperatures from degrees Fahrenheit to degrees Celsius, so -Prepared Data also has type `JavaPairRDD<String, Float>` - -### Define Query - -This is the same as the local counterpart. - -<div class="tabs"> - <div data-tab="Java" data-lang="java"> -```java -public class Query implements Serializable { - String day; - - public Query(String day) { - this.day = day; - } -} -``` - </div> -</div> - -### Define Model - -Our Model is of the same type as the Training Data, i.e. -`JavaPairRDD<String, Float>`. To have it output the contents instead of an -address when we do `toString()`, we override it: - -<div class="tabs"> - <div data-tab="Java" data-lang="java"> -```java -public JavaPairRDD<String, Float> temperatures; - -@Override -public String toString() { - boolean longList = temperatures.count() > LIST_THRESHOLD ? true : false; - List<Tuple2<String, Float>> readings = - temperatures.take(longList ? LIST_THRESHOLD : (int) temperatures.count()); - StringBuilder builder = new StringBuilder(); - builder.append("("); - boolean first = true; - for (Tuple2<String, Float> reading : readings) { - if (!first) { - builder.append(", "); - } else { - first = false; - } - builder.append(reading); - } - if (longList) { - builder.append(", ..."); - } - builder.append(")"); - return builder.toString(); -} -``` - </div> -</div> - -### Define Prediction Result - -Prediction result is simply a `Float`. - -## 2. Implement the Data Source - -Only scala and spark related imports are shown; refer to the source code for -the whole thing. - -<div class="tabs"> - <div data-tab="Java" data-lang="java"> -```java -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.api.java.function.PairFunction; - -import scala.Tuple2; -import scala.Tuple3; - -public class DataSource extends PJavaDataSource< - EmptyParams, Object, JavaPairRDD<String, Float>, Query, Object> { - @Override - public Iterable<Tuple3<Object, JavaPairRDD<String, Float>, JavaPairRDD<Query, Object>>> - read(JavaSparkContext jsc) { - JavaPairRDD<String, Float> readings = jsc.textFile("path/to/data.csv") - .mapToPair(new PairFunction<String, String, Float>() { - @Override - public Tuple2 call(String line) { - String[] tokens = line.split("[\t,]"); - Tuple2 reading = null; - try { - reading = new Tuple2( - tokens[0], - Float.parseFloat(tokens[1])); - } catch (Exception e) { - logger.error("Can't parse reading file. Caught Exception: " + e.getMessage()); - System.exit(1); - } - return reading; - } - }); - - List<Tuple3<Object, JavaPairRDD<String, Float>, JavaPairRDD<Query, Object>>> data = - new ArrayList<>(); - - data.add(new Tuple3( - null, - readings, - jsc.parallelizePairs(new ArrayList<Tuple2<Query, Object>>()) - )); - - return data; - } -} -``` - </div> -</div> - -## 3. Implement the Preparator - -As mentioned above, we convert the scale from Fahrenheit to Celsius: - -<div class="tabs"> - <div data-tab="Java" data-lang="java"> -```java -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.api.java.function.Function; - -public class Preparator extends - PJavaPreparator<EmptyParams, JavaPairRDD<String, Float>, JavaPairRDD<String, Float>> { - - @Override - public JavaPairRDD<String, Float> prepare(JavaSparkContext jsc, - JavaPairRDD<String, Float> data) { - return data.mapValues(new Function<Float, Float>() { - @Override - public Float call(Float temperature) { - // let's convert it to degrees Celsius - return (temperature - 32.0f) / 9 * 5; - } - }); - } -} -``` - </div> -</div> - -## 4. Implement an Algorithm - -We need to implement `train()`, `batchPredict()` and `predict()`. We create -a `ReadingAndCount` class for sake of doing aggregation (taking average). - -<div class="tabs"> -<div data-tab="Java" data-lang="java"> -```java -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.api.java.function.Function; -import org.apache.spark.api.java.function.Function2; -import org.apache.spark.api.java.function.PairFunction; - -import scala.Tuple2; - -public class Algorithm extends PJavaAlgorithm< - EmptyParams, JavaPairRDD<String, Float>, Model, Query, Float> { - - final static Logger logger = LoggerFactory.getLogger(Algorithm.class); - - public static class ReadingAndCount implements Serializable { - public float reading; - public int count; - - public ReadingAndCount(float reading, int count) { - this.reading = reading; - this.count = count; - } - - public ReadingAndCount(float reading) { - this(reading, 1); - } - - @Override - public String toString() { - return "(reading = " + reading + ", count = " + count + ")"; - } - } - - @Override - public Model train(JavaPairRDD<String, Float> data) { - // take averages just like the local helloworld program - JavaPairRDD<String, Float> averages = data.mapValues( - new Function<Float, ReadingAndCount>() { - @Override - public ReadingAndCount call(Float reading) { - return new ReadingAndCount(reading); - } - }).reduceByKey( - new Function2<ReadingAndCount, ReadingAndCount, ReadingAndCount>() { - @Override - public ReadingAndCount call(ReadingAndCount rac1, ReadingAndCount rac2) { - return new ReadingAndCount(rac1.reading + rac2.reading, rac1.count + rac2.count); - } - }).mapValues( - new Function<ReadingAndCount, Float>() { - @Override - public Float call(ReadingAndCount rac) { - return rac.reading / rac.count; - } - }); - return new Model(averages); - } - - @Override - public JavaPairRDD<Object, Float> batchPredict(Model model, - JavaPairRDD<Object, Query> indexedQueries) { - return model.temperatures.join(indexedQueries.mapToPair( - new PairFunction<Tuple2<Object, Query>, String, Object>() { - @Override // reverse the query tuples, then join - public Tuple2 call(Tuple2<Object, Query> tuple) { - return new Tuple2(tuple._2.day, tuple._1); - } - })).mapToPair( - new PairFunction<Tuple2<String, Tuple2<Float, Object>>, Object, Float>() { - @Override // map result back to predictions, dropping the day - public Tuple2 call(Tuple2<String, Tuple2<Float, Object>> tuple) { - return new Tuple2(tuple._2._2, tuple._2._1); - } - }); - } - - @Override - public Float predict(Model model, Query query) { - final String day = query.day; - List<Float> reading = model.temperatures.lookup(day); - return reading.get(0); - } -} -``` - </div> -</div> - -## 5. Implement the Serving - -Since there is only one algorithm, there is one prediction, so we just extract -it out. Note that we are using LJavaServing even though other stages are done -in a parallel manner. - -<div class="tabs"> - <div data-tab="Java" data-lang="java"> -```java -public class Serving extends LJavaServing<EmptyParams, Query, Float> { - @Override - public Float serve(Query query, Iterable<Float> predictions) { - return predictions.iterator().next(); - } -} -``` - </div> -</div> - -# Deploying the "HelloWorld" Engine Instance - -After the new engine is built, it is time to deploy an engine instance of it. - -Prepare training data: - -```bash -$ cp $PIO_HOME/examples/data/helloworld/data1.csv path/to/data.csv -``` - -Register engine: - -```bash -$ $PIO_HOME/bin/pio register -``` - -Train: - -```bash -$ $PIO_HOME/bin/pio train -``` - -Example output: - -``` -2014-10-06 16:43:01,820 INFO spark.SparkContext - Job finished: count at Workflow.scala:527, took 0.016301 s -2014-10-06 16:43:01,820 INFO workflow.CoreWorkflow$ - DP 0 has 0 rows -2014-10-06 16:43:01,821 INFO workflow.CoreWorkflow$ - Metrics is null. Stop here -2014-10-06 16:43:01,933 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: KzBHWQTsR9afg3_2mb5GfQ -``` - -Deploy: - -```bash -$ $PIO_HOME/bin/pio deploy -``` -Retrieve prediction: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Mon" }' http://localhost:8000/queries.json -``` -Output: - -```json -24.166668 -``` - -Retrieve prediction: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Tue" }' http://localhost:8000/queries.json -``` - -Output: - -```json -26.944447 -``` - -## Re-training - -Re-train with new data: - -```bash -$ cp $PIO_HOME/examples/data/helloworld/data2.csv path/to/data.csv -``` - -```bash -$ $PIO_HOME/bin/pio train -$ $PIO_HOME/bin/pio deploy -``` - -Retrieve prediction: - -```bash -$ curl -H "Content-Type: application/json" -d '{ "day": "Mon" }' http://localhost:8000/queries.json -``` - -Output: - -```json -24.814814 -``` http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/combiningalgorithms.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/combiningalgorithms.html.md b/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/combiningalgorithms.html.md deleted file mode 100644 index 4a09a3a..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/combiningalgorithms.html.md +++ /dev/null @@ -1,451 +0,0 @@ ---- -title: Serving - Combining Algorithms ---- - -# Serving - Combining Multiple Algorithms - -At this point you have already had a sense of implementing, deploying, and -evaluating a recommendation system with collaborative filtering techniques. -However, this technique suffers from a cold-start problem where new items have -no user action history. In this tutorial, we introduce a feature-based -recommendation technique to remedy this problem by constructing a user-profile -for each user. In addition, Prediction.IO infrastructure allows you to combine - multiple recommendation systems together in a single engine. For a - history-rich items, the engine can use results from the collaborative - filtering algorithm, and for history-absent items, the engine returns - prediction from the feature-based recommendation algorithm. Moreover, we can - ensemble multiple predictions too. - -This tutorial guides you toward incorporating a feature-based algorithm into -the existing CF-based recommendation engine introduced in tutorials 1, 2 and 3. - -All code can be found in the -[tutorial4/](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4) -directory. - -## Overview -In the previous tutorial, we have covered `DataSource` and `Algorithm` as -crucial parts of an engine. A complete engine workflow looks like the following -figure: - -``` - DataSource.read - (TrainingData) - v - Preparator.prepare - (PreparedData) - v - +-------------+-------------+ - v v v -Algo1.train Algo2.train Algo3.train - (Model1) (Model2) (Model3) - v v v -Algo1.predict Algo2.predict Algo3.predict <- (Query) -(Prediction1) (Prediction2) (Prediction3) - v v v - +-------------+-------------+ - v - Serving.serve - (Prediction) -``` - -`Preparator` is the class which preprocess the training data which will be used -by multiple algorithms. For example, it can be a NLP processor which generates -useful n-grams, or it can be some business logics. - -Engine is designed to support multiple algorithms. They need to take the same -`PreparedData` as input for model construction, but each algorithm can have its -own model class). Algorithm takes a common `Query` as input and return a -`Prediction` as output. - -Finally, the serving layer `Serving` combines result from multiple algorithms, -and possible apply some final business logic before returning. - -This tutorial implements a simple `Preparator` for feature generation, a -feature based algorithm, and a serving layer which ensembles multiple -predictions. - -## DataSource -We have to amend the -[`DataSource`](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/DataSource.java) -to take into account more information from MovieLens, as well as adding some -fake data for demonstration. We use the genre of movies as its feature vector. -This part is simliar to earlier tutorials. - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial4.Runner4a -- -- data/ml-100k/ -``` -where `$PIO_HOME` is the root directory of the PredictionIO code tree. - -You should see - -``` -2014-09-30 17:16:45,515 INFO spark.SparkContext - Job finished: collect at Workflow.scala:388, took 0.020437 s -2014-09-30 17:16:45,516 INFO workflow.CoreWorkflow$ - Data Set 0 -2014-09-30 17:16:45,516 INFO workflow.CoreWorkflow$ - Params: Empty -2014-09-30 17:16:45,517 INFO workflow.CoreWorkflow$ - TrainingData: -2014-09-30 17:16:45,517 INFO workflow.CoreWorkflow$ - [TrainingData: rating.size=100003 genres.size=19 itemInfo.size=1685 userInfo.size=946] -2014-09-30 17:16:45,517 INFO workflow.CoreWorkflow$ - TestingData: (count=0) -2014-09-30 17:16:45,518 INFO workflow.CoreWorkflow$ - Data source complete -2014-09-30 17:16:45,518 INFO workflow.CoreWorkflow$ - Preparator is null. Stop here -``` - - -## Preparator -As we have read the raw data from `DataSource`, we can *preprocess* the raw -data into a more useable form. In this tutorial, we generate a feature vector -for movies based on its genre. - -We need to implement two classes: `Preparator` and `PreparedData`. -`Preparator` is a class implementing a method `prepare` which transform -`TrainingData` into `PreparedData`; `PreparedData` is the output and the object -being passed to `Algorithms` for training. `PreparedData` can be anything, very -often it is equivalent to `TrainingData`, or subclass of it. Here, -[`PreparedData`](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/PreparedData.java) -is a subclass of `TrainingData`, -it adds a map from items (movies) to feature vectors. The merit of using -subclass is that, it makes the original `TrainingData` easily accessible. - -The -[`Preparator`](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/Preparator.java) -class simply examines item info and extract a feature vector from item info. - -After implementing these two classes, you can add them to the workflow and try -out if things are really working. Add the preparator class to the engine -builder, as shown in -[`Runner4b.java`](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/Runner4b.java): - -```java -return new JavaEngineBuilder< - TrainingData, EmptyParams, PreparedData, Query, Float, Object> () - .dataSourceClass(DataSource.class) - .preparatorClass(Preparator.class) // Add the new preparator - .build(); -``` - -And you can test it out with - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial4.Runner4b -- -- data/ml-100k/ -``` - -You should see - -``` -2014-09-30 17:28:37,335 INFO spark.SparkContext - Job finished: collect at WorkflowUtils.scala:179, took 0.209634 s -2014-09-30 17:28:37,335 INFO workflow.CoreWorkflow$ - Prepared Data Set 0 -2014-09-30 17:28:37,336 INFO workflow.CoreWorkflow$ - Params: Empty -2014-09-30 17:28:37,336 INFO workflow.CoreWorkflow$ - PreparedData: [TrainingData: rating.size=100003 genres.size=19 itemInfo.size=1685 userInfo.size=946 itemFeatures.size=1685 featureCount=19] -2014-09-30 17:28:37,336 INFO workflow.CoreWorkflow$ - Preparator complete -2014-09-30 17:28:37,337 INFO workflow.CoreWorkflow$ - Algo model construction -2014-09-30 17:28:37,337 INFO workflow.CoreWorkflow$ - AlgoList has zero length. Stop here -``` - - -## Feature-Based Algorithm -This algorithm creates a feature profile for every user using the feature -vector in `PreparedData`. More specifically, if a user has rated 5 stars on -*Toy Story* but 1 star on *The Crucible*, the user profile would reflect that -this user likes comedy and animation but dislikes drama. - -The movie lens rating is an integer ranged from 1 to 5, we incorporate it into -the algorithm with the following parameters: - -```java -public class FeatureBasedAlgorithmParams implements JavaParams { - public final double min; - public final double max; - public final double drift; - public final double scale; - ... -} -``` - -We only consider rating from `min` to `max`, and we normalize the rating with -this function: `f(rating) = (rating - drift) * scale`. As each movie is -associated with a binary feature vector, the user feature vector is essentially -a rating-weighted sum of all movies (s)he rated. After that, we normalize all -user feature vector by L-inf norm, this will ensure that user feature is bounded -by [-1, 1]. In laymen terms, -1 indicates that the user hates that feature, -whilst 1 suggests the opposite. The following is a snippet of the [actual -code](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/FeatureBasedAlgorithm.java). -`data` is an instance of `PreparedData` that is passed as an argument to the -`train` function. - -```java -for (Integer uid : data.userInfo.keySet()) { - userFeatures.put(uid, new ArrayRealVector(data.featureCount)); - userActions.put(uid, 0); -} - -for (TrainingData.Rating rating : data.ratings) { - final int uid = rating.uid; - final int iid = rating.iid; - final double rate = rating.rating; - - // Skip features outside the range. - if (!(params.min <= rate && rate <= params.max)) continue; - - final double actualRate = (rate - params.drift) * params.scale; - final RealVector userFeature = userFeatures.get(uid); - final RealVector itemFeature = data.itemFeatures.get(iid); - userFeature.combineToSelf(1, actualRate, itemFeature); -} - -// Normalize userFeatures by l-inf-norm -for (Integer uid : userFeatures.keySet()) { - final RealVector feature = userFeatures.get(uid); - feature.mapDivideToSelf(feature.getLInfNorm()); -} -``` - -[Runner4c.java](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/Runner4c.java) -illustrates the engine factory up to this point. We use a default serving class -as we only have one algorithm. (We will demonstrate how to combine prediction -results from multiple algorithms later in this tutorial). We are able to define -[an end-to-end -engine](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorialsrc/main/java/recommendations/tutorial4/SingleEngineFactory.java). - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial4.Runner4c -- -- data/ml-100k/ -``` - -## Deployment - -We can deploy this feature based engine just like tutorial 1. We have an [engine -JSON](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/single-algo-engine.json), -and we register it: - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio register --engine-json src/main/java/recommendations/tutorial4/single-algo-engine.json -``` - -The script automatically recompiles updated code. You will need to re-run this -script if you have update any code in your engine. - -### Specify Engine Parameters -We use the following JSON files for deployment. - -1. [datasource.json](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/single-jsons/datasource.json): - - ```json - { - "dir" : "data/ml-100k/", - "addFakeData": true - } - ``` - -2. [algorithms.json](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/single-jsons/algorithms.json): - - ```json - [ - { - "name": "featurebased", - "params": { - "min": 1.0, - "max": 5.0, - "drift": 3.0, - "scale": 0.5 - } - } - ] - ``` - - Recall that we support multiple algorithms. This JSON file is actually a - list of *name-params* pair where the *name* is the identifier of algorithm - defined in EngineFactory, and the *params* value corresponds to the algorithm - parameter(s). - - -### Start training -The following command kick-starts the training, which will return an id when -the training is completed. - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio train \ - --engine-json src/main/java/recommendations/tutorial4/single-algo-engine.json \ - --params-path src/main/java/recommendations/tutorial4/single-jsons -``` - -### Deploy server -As the training is completed, you can deploy a server - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio deploy --engine-json src/main/java/recommendations/tutorial4/single-algo-engine.json -``` - -### Try a few things - -Fake user -1 (see -[DataSource.FakeData](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/DataSource.java)) -loves action movies. If we pass in item 27 (Bad Boys), we should get a high -rating (i.e. 1). You can use our script `bin/cjson` to send the JSON request. -The first parameter is the JSON request, and the second parameter is the server -address. - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/cjson '{"uid": -1, "iid": 27}' http://localhost:8000/queries.json -``` - -Fake item -2 is a cold item (i.e. has no rating). But from the data, we know -that it is a movie catagorized under "Action" genre, hence, it should also have -a high rating with Fake user -1. - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/cjson '{"uid": -1, "iid": -2}' http://localhost:8000/queries.json -``` - -However, there is nothing we can do with a cold user. Fake user -3 has no -rating history, we know nothing about him. If we request any rating with fake -user -3, we will get a NaN. -TODO: @Donald "NaN is not a valid double value as per JSON specification. To override this behavior, use GsonBuilder.serializeSpecialFloatingPointValues() method." - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/cjson '{"uid": -3, "iid": 1}' http://localhost:8000/queries.json -``` - -## Multiple Algorithms -We have two algorithms available, one is a collaborative filtering algorithm and -the other is a feature-based algorithm. Prediction.IO allows you to create an -engine that ensembles multiple algorithms prediction, you may use feature-based -algorithm for cold-start items (as CF-based algos cannot handle items with no -ratings), and use both algorithms for others. - -### Combining Algorithms Output - -[Serving](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/Serving.java) -is the last step of the pipeline. It takes prediction results from all -algorithms, combine them and return. In the current case, we take an average of -all valid (i.e. not NaN) predictions. In the extreme case where all algorithms -return NaN, we also return NaN. Engine builders need to implement the `serve` -method. We demonstrate with our case: - -```java -public Float serve(Query query, Iterable<Float> predictions) { - float sum = 0.0f; - int count = 0; - - for (Float v: predictions) { - if (!v.isNaN()) { - sum += v; - count += 1; - } - } - return (count == 0) ? Float.NaN : sum / count; -} -``` - -### Complete Engine Factory - -[EngineFactory.java](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/EngineFactory.java) -demonstrates how to specify multiple algorithms in the same engine. When we add -algorithms to the builder instance, we also need to specify a String which is -served as the identifier. For example, we use "featurebased" for the -feature-based algorithm, and "collaborative" for the collaborative-filtering -algorithm. - -```java -public class EngineFactory implements IJavaEngineFactory { - public JavaEngine<TrainingData, EmptyParams, PreparedData, Query, Float, Object> apply() { - return new JavaEngineBuilder< - TrainingData, EmptyParams, PreparedData, Query, Float, Object> () - .dataSourceClass(DataSource.class) - .preparatorClass(Preparator.class) - .addAlgorithmClass("featurebased", FeatureBasedAlgorithm.class) - .addAlgorithmClass("collaborative", CollaborativeFilteringAlgorithm.class) - .servingClass(Serving.class) - .build(); - } -} -``` - -Similar to the earlier example, we need to write [a -JSON](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/multiple-algo-engine.json) -for the engine, and register it with PredictionIO. Here's the content: - -```json -{ - "id": "org.apache.predictionio.examples.java.recommendations.tutorial4.EngineFactory", - "version": "0.8.2", - "name": "FeatureBased Recommendations Engine", - "engineFactory": "org.apache.predictionio.examples.java.recommendations.tutorial4.EngineFactory" -} -``` - -The following script register the engines. Important to note that, the script -also copies all related files (jars, resources) of this engine to a permanent -storage, if you have updated the engine code or add new dependencies, you need -to rerun this command. - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio register --engine-json src/main/java/recommendations/tutorial4/multiple-algo-engine.json -``` - -Now, we can specify the engine instance by passing the set of parameters to the -engine. Our engine can support multiple algorithms, and in addition, it also -support multiple instance of the same algorithms. We illustrate with -[algorithms.json](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial/src/main/java/recommendations/tutorial4/jsons/algorithms.json): - -```json -[ - { - "name": "featurebased", - "params": { - "min": 1.0, - "max": 5.0, - "drift": 3.0, - "scale": 0.5 - } - }, - { - "name": "featurebased", - "params": { - "min": 4.0, - "max": 5.0, - "drift": 3.0, - "scale": 0.5 - } - }, - { - "name": "collaborative", - "params": { - "threshold": 0.2 - } - } -] -``` - -This JSON contains three algorithm parameters. The first two correspond to the -feature-based algorithm, and the third corresponds to the collaborative -filtering algorithm. The first allows all 5 ratings, and the second allows only -ratings higher than or equals to 4. This gives a bit more weight on the -high-rating features. Once [all parameter files are -specified](https://github.com/PredictionIO/PredictionIO/tree/master/examples/java-local-tutorial//src/main/java/recommendations/tutorial4/jsons/), -we can start the training phase and start the API server: - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio train \ - --engine-json src/main/java/recommendations/tutorial4/multiple-algo-engine.json \ - --params-path src/main/java/recommendations/tutorial4/jsons -... -2014-09-30 21:59:36,256 INFO spark.SparkContext - Job finished: collect at Workflow.scala:695, took 3.961802 s -2014-09-30 21:59:36,529 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: Bp60PPk4SHqyeDzDPHbv-Q - -$ ../../bin/pio deploy --engine-json src/main/java/recommendations/tutorial4/multiple-algo-engine.json -``` - -By default, the server starts on port 8000. Open it with your browser and you -will see all the meta information about this engine instance. - -You can submit various queries to the server and see what you get. http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/dataalgorithm.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/dataalgorithm.html.md b/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/dataalgorithm.html.md deleted file mode 100644 index 1e109ed..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/dataalgorithm.html.md +++ /dev/null @@ -1,417 +0,0 @@ ---- -title: Data and Algorithm Implementation ---- - -# Create an Engine with Data and Algorithm - -## Step 1. Define the data class type - -For this Item Recommendation Engine, the data class types are defined as the -following: - -- *Training Data (TD)*: List of user ID, item ID and ratings, as defined in - `TrainingData.java`. - - ```java - public class TrainingData implements Serializable { - public List<Rating> ratings; - - public TrainingData(List<Rating> ratings) { - this.ratings = ratings; - } - - public static class Rating implements Serializable { - public int uid; // user ID - public int iid; // item ID - public float rating; - - public Rating(int uid, int iid, float rating) { - this.uid = uid; - this.iid = iid; - this.rating = rating; - } - } - ``` - -- *Input Query (Q)*: User ID and item ID, as defined in the `Query.java`. - - ```java - public class Query implements Serializable { - public int uid; // user ID - public int iid; // item ID - - public Query(int uid, int iid) { - this.uid = uid; - this.iid = iid; - } - } - - ``` - -- *Prediction output (P)*: Predicted preference value. Primitive class `Float` - can be used. - -- *Prepared Data (PD)*: Because the algorithm can directly use the - `TrainingData`, the same `TrainingData` is used and no need to define - *Prepared Data* separately. - -- *Model (M)*: Because it's data type returned by the *Algorithm* component, We - will define this when we implement the algorithm. - -- *Actual (A)*: in this tutorial, we are not going to do evaluation which will - be explained in later tutorials. We can simply use `Object` type for it. - -As you can see, if the data is simple field, you may use primitive class type -such as `Integer`, `Float`. If your data contain multiple fields, you may define -your own class (such as `Query` in this tutorial). The requirement is that the -data class must implement the `Serializable` interface. - - -## Step 2. Implement DataSource - -The *DataSource* component is responsible for reading data from the source (Eg. -database or text file, etc) and prepare the *Training Data (TD)*. - -In this tutorial, the *DataSource* component needs one parameter which specifies -the path of file containing the rating data. - -Note that each controller component (*DataSource, Preparator, Algorithm, Serving -and Metrics*) is restricted to having empty constructor or constructor which -takes exactly one argument which must implement the -`org.apache.predictionio.controller.java.JavaParams` interface. - -We can define the DataSource parameter class as following (in -`DataSourceParams.java`): - -```java -public class DataSourceParams implements JavaParams { - public String filePath; // file path - - public DataSourceParams(String path) { - this.filePath = path; - } -} -``` - -The *DataSource* component must extend -`org.apache.predictionio.controller.java.LJavaDataSource`: - -```java -public abstract class LJavaDataSource<DSP extends Params,DP,TD,Q,A> -``` - -`LJavaDataSource` stands for *Local Java DataSource*, meaning that it is a Java -*DataSource* component which can be run in single machine, which requires the -following type parameters: - -- `DSP`: *DataSource Parameters* class, which is the `DataSourceParams` class we - just defined above. -- `DP`: *Data Parameters* class. It is used to describe the generated *Training - Data* and the Test Data *Query and Actual*, which is used by *Metrics* during - evaluation. Because we are not going to demonstrate evaluation in this first - tutorial. `Object` type can be used. -- `TD`: *Training Data* class, which is the `TrainingData` class defined in step - 1. -- `Q`: Input *Query* class, which is the `Query` class defined in step 1. -- `A`: *Actual* result class, which is the `Object` class defined in step 1. - -```java -public class DataSource extends LJavaDataSource< - DataSourceParams, Object, TrainingData, Query, Object> { - // ... - } -``` - -The only function you need to implement is `LJavaDataSource`'s `read()` method. - -```java -public abstract Iterable<scala.Tuple3<DP,TD,Iterable<scala.Tuple2<Q,A>>>> read() -``` - -The `read()` method should read data from the source (e.g. database or text -file, etc.) and return the *Training Data* (`TD`) and *Test Data* -(`Iterable<scala.Tuple2<Q,A>>`) with a *Data Parameters* (`DP`) associated with -this *Training and Test Data Set*. - -Note that the `read()` method's return type is `Iterable` because it could -return one or more of *Training and Test Data Set*. For example, we may want to -evaluate the engine with multiple iterations of random training and test split. -In this case, each set corresponds to each split. - -Because we are going to only demonstrate deploying *Engine* in this first -tutorial, the `read()` will only return one set of *Training Data* and the *Test -Data* will simply an empty list. - -You could find the implementation of `read()` in `DataSource.java`. It reads -comma or tab delimited rating file and return `TrainingData`. - - -## Step 3. Implement Algorithm - -In this tutorial, a simple item based collaborative filtering algorithm is -implemented for demonstration purpose. This algorithm computes the item -similarity score between each item and returns a *Model* which consists of the -item similarity scores and users' rating history. The item similarity scores and -users' rating history will be used to compute the predicted rating value of an -item by the user. - -This algorithm takes a threshold as parameter and discard any item pairs with -similarity score lower than this threshold. The algorithm parameters class is -defined in `AlgoParams.java`: - -```java -public class AlgoParams implements JavaParams { - public double threshold; - - public AlgoParams(double threshold) { - this.threshold = threshold; - } -} -``` - -The *Model* generated by the algorithm is defined in `Model.java`: - -```java -public class Model implements Serializable { - public Map<Integer, RealVector> itemSimilarity; - public Map<Integer, RealVector> userHistory; - - public Model(Map<Integer, RealVector> itemSimilarity, - Map<Integer, RealVector> userHistory) { - this.itemSimilarity = itemSimilarity; - this.userHistory = userHistory; - } -} -``` - -The *Algorithm* component must extend -`org.apache.predictionio.controller.java.LJavaAlgorithm`. - -```java -public abstract class LJavaAlgorithm<AP extends Params,PD,M,Q,P> -``` -Similar to `LJavaDataSource`, `LJavaAlgorithm` stands for *Local Java -Algorithm*, meaning that it is a Java *Algorithm* component which can be run in -single machine, which requires the following type parameters: - -- `AP`: *Algorithm Parameters* class, which is the `AlgoParams` class we just - defined above. -- `PD`: *Prepared Data* class, which is the same as the `TrainingData` class as - described in step 1. -- `M`: *Model* class, which is the `Model` class defined above. -- `Q`: Input *Query* class, which is the `Query` class defined in step 1. -- `P`: *Prediction* output class, which is the `Float` class defined in step 1. - -```java -public class Algorithm extends - LJavaAlgorithm<AlgoParams, TrainingData, Model, Query, Float> { - // ... -} -``` - -You need to implement two methods of `LJavaAlgorithm`: - -- `train` method: - - ```java - public abstract M train(PD pd) - ``` - -The `train` method produces a *Model* of type `M` from *Prepared Data* of type -`PD`. - -- `predict` method: - - ```java - public abstract P predict(M model, Q query) - ``` - -The `predict` method produces a *Prediction* of type `P` from a *Query* of type -`Q` and trained *Model* of type `M`. - -You could find the implementation of these methods in `Algorithm.java`. - - -## Step 4. Implement Engine Factory - -PredictionIO framework requires an *Engine Factory* which returns an *Engine* -with the controller components defined. - -The *Engine Factory* must implement the -`org.apache.predictionio.controller.java.IJavaEngineFactory` interface and implement the -`apply()` method (as shown in `EngineFactory.java`): - -```java -public class EngineFactory implements IJavaEngineFactory { - public JavaSimpleEngine<TrainingData, Object, Query, Float, Object> apply() { - return new JavaSimpleEngineBuilder< - TrainingData, Object, Query, Float, Object> () - .dataSourceClass(DataSource.class) - .preparatorClass() // Use default Preparator - .addAlgorithmClass("MyRecommendationAlgo", Algorithm.class) - .servingClass() // Use default Serving - .build(); - } -} -``` - -To build an *Engine*, we need to define the class of each component. A -`JavaEngineBuilder` is used for this purpose. In this tutorial, because the -*Prepared Data* is the same as *Training Data*, we can use -`JavaSimpleEngineBuilder`. - -As you can see, we specify classes `DataSource` and `Algorithm` we just -implemented in above steps. - -To deploy engine, we also need a serving layer. For `JavaSimpleEngine` with -single algorithm, we can use the default *Serving* component by simply calling -the method `servingClass()` without specifying any class name. Building a custom -*Serving* components will be explained in later tutorials. - -Note that an *Engine* can contain different algorithms. This will be -demonstrated in later tutorials. - -The `addAlgorithmClass()` method requires the name of algorithm ( -"MyRecommendationAlgo" in this case) which will be used later when we specify -the parameters for this algorithm. - - -## Step 5. Compile and Register Engine - -We have implemented all the necessary blocks to deploy this Item Recommendation -Engine. Next, we need to register this Item Recommendation Engine into -PredictionIO. - -An engine manifest `engine.json` is needed to describe the Engine: - -```json -{ - "id": "org.apache.predictionio.examples.java.recommendations.tutorial1.EngineFactory", - "version": "0.8.2", - "name": "Simple Recommendations Engine", - "engineFactory": "org.apache.predictionio.examples.java.recommendations.tutorial1.EngineFactory" -} -``` - -The `engineFactory` is the class name of the `EngineFactory` class created -above. The `id` and `version` will be referenced later when we run the engine. - -Execute the following command to compile and register the engine: - -```bash -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio register --engine-json src/main/java/recommendations/tutorial1/engine.json -``` - -The `register` command takes the engine JSON file (with the `--engine-json` -parameter). Note that you need to register the engine again if you have modified -the codes to re-compile them. - -## Step 6. Specify Parameters for the Engine - -Our `DataSource` and `Algorithm` classes require parameters, which can be -specified with JSON files. - -In this tutorial, the `DataSourceParams` has a parameter which is the file path -of the ratings file. The JSON is defined as following -(`params/datasource.json` under `tutorial1/`): - -```json -{ "filePath": "data/ml-100k/u.data" } -``` - -Note that the key name (`filePath`) must be the same as the corresponding field -name defined in the `DataSourceParams` class. - -For `Algorithms`, we need to define a JSON array (`params/algorithms.json`): - -```json -[ - { - "name": "MyRecommendationAlgo", - "params": { - "threshold": 0.2 - } - } -] -``` - -The key `name` is the name of the algorithm which should match the one defined -in the `EngineFactory` class in above step, which specifies the name of the -algorithm of the *Engine* we want to deploy. The `params` defines the parameters -for this algorithm. - -Note that even if your algorithm takes no parameter, you still need to put empty -JSON `{}`. For example: - -```json -[ - { - "name": "MyAnotherRecommendationAlgo", - "params": {} - } -] -``` - -## Step 7. Train Engine and Deploy Server - -Now, we have everything in place. Let's run it! - -We use `../../bin/pio train` to train the *Engine*, which builds and saves the -algorithm *Model* for serving real time requests. - -Execute the following commands: - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio train \ - --engine-json src/main/java/recommendations/tutorial1/engine.json \ - --params-path src/main/java/recommendations/tutorial1/params -``` - -The `--engine-json` points to the JSON file `engine.json`. The `--params-path` -is the base directory of parameters JSON files. - -When it finishes, you should see the following at the end of terminal output: - -``` -2014-09-30 22:04:57,784 INFO spark.SparkContext - Job finished: collect at Workflow.scala:695, took 4.061274 s -2014-09-30 22:04:57,984 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: ROSwUHDAQSyYGXs5YG0eQw -``` - -(If you don't see the `Saved engine instance with ID` line, scroll back up to -look for error messages. Chances are you skipped the step for downloading the -MovieLens data. Please refer to the [Overview](index.html) section if that is -the case.) - -Next, execute the `../../bin/pio deploy` command with the returned `ID`: - -``` -$ ../../bin/pio deploy --engine-json src/main/java/recommendations/tutorial1/engine.json -``` - -This will create a server that by default binds to http://localhost:8000. You -can visit that page in your web browser to check its status. - -Now you can retrive prediction result by sending a HTTP request to the server -with the `Query` as JSON payload. Remember that our `Query` class is defined -with `uid` and `iid` fields. The JSON key name must be the same as the field -names of the `Query` class (`uid` and `iid` in this example). - -For example, to retrieve the predicted preference for item ID 3 by user ID 1, -run the following in terminal: - -``` -$ curl -H "Content-Type: application/json" -d '{"uid": 1, "iid": 3}' http://localhost:8000/queries.json -``` - -You should see the predicted preference value returned: - -``` -3.937741 -``` - -Congratulations! Now you have built a prediction engine which uses the trained -model to serve real-time queries and returns prediction results! - -Now you may want to [test the engine components](testcomponents.html). http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/evaluation.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/evaluation.html.md b/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/evaluation.html.md deleted file mode 100644 index 3a4612b..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/evaluation.html.md +++ /dev/null @@ -1,252 +0,0 @@ ---- -title: Evaluation ---- - -# Evaluation - -In this tutorial, we will demonstrate how to implement an evaluation *Evaluator* -component to run *Offline Evaluation* for the *Engine*. We will continue to use -the Item Recommendation Engine developed in Tutorial1 as example and implement a -*Evaluator* which computes the Root Mean Square Error. - -## Step 1 - Training and Test Set Split - -To run *Offline Evaluation*, we need *Training* and *Test Set* data. We will -modify `DataSource.java` to do a random split of the rating data to generate -the *Test Set*. For demonstration purpose, the modified `DataSource.java` is put -under directory `tutorial3/`. - -Recall that `org.apache.predictionio.controller.java.LJavaDataSource` takes the -following type parameters: - -```java -public abstract class LJavaDataSource<DSP extends Params,DP,TD,Q,A> -``` -- `DSP`: *DataSource Parameters* class. -- `DP`: *Data Parameters* class. It is used to describe the generated *Training - Data* and the Test Data *Query and Actual*, which is used by *Evaluator* during - evaluation. -- `TD`: *Training Data* class. -- `Q`: Input *Query* class. -- `A`: *Actual* result class. - -The *Actual* result is used by *Evaluator* to compare with *Prediciton* outputs to -compute the score. In this tutorial, the *Actual* result is also the rating -value, which is `Float` type. Since we don't have any *Data Parameters* defined, -we can simply use `Object`. - -You can find the implementation in `tutorial3/DataSource.java`: - -```java -public class DataSource extends LJavaDataSource< - DataSourceParams, Object, TrainingData, Query, Float> { - //... - @Override - public Iterable<Tuple3<Object, TrainingData, Iterable<Tuple2<Query, Float>>>> read() { - // ... - } -} -``` - -As explained in earlier tutorials, the `read()` method should read data from the -source (e.g. database or text file, etc) and return the *Training Data* (`TD`) -and *Test Data* (`Iterable<Tuple2<Q, A>>`) with a *Data Parameters* (`DP`) associated -with this *Training and Test Data Set*. - -Note that the `read()` method's return type is `Iterable` because it could -return one or more of *Training and Test Data Set*. For example, we may want to -evaluate the engine with multiple iterations of random training and test data -split. In this case, each set corresponds to one such random split. - -Note that the *Test Data* is actually an `Iterable` of input *Query* and -*Actual* result. During evaluation, PredictionIO sends the *Query* to the engine -and retrieve *Prediction* output, which will be evaluated against the *Actual* -result by the *Evaluator*. - - -## Step 2 - Evaluator - -We will implement a Root Mean Square Error (RMSE) evaluator. You can find the -implementation in `Evaluator.java`. The *Evaluator* extends -`org.apache.predictionio.controller.java.JavaEvaluator`, which requires the following type -parameters: - -```java -public abstract class JavaEvaluator<EP extends Params,DP,Q,P,A,EU,ES,ER> -``` -- `EP`: *Evaluator Parameters* class. -- `DP`: *Data Parameters* class. -- `Q`: Input *Query* class. -- `P`: *Prediction* output class. -- `A`: *Actual* result class. -- `EU`: *Evaluator Unit* class. -- `ES`: *Evaluator Set* class. -- `ER`: *Evaluator Result* class. - -and overrides the following methods: - -```java -public abstract EU evaluateUnit(Q query, P predicted, A actual) - -public abstract ES evaluateSet(DP dataParams, Iterable<EU> evaluationUnits) - -public abstract ER evaluateAll(Iterable<scala.Tuple2<DP,ES>> input) -``` - -The method `computeUnit()` computes the *Evaluator Unit (EU)* for each *Prediction* -and *Actual* results of the input *Query*. - -For this RMSE evaluator, `evaluateUnit()` returns the square error of each -predicted rating and actual rating. - -```java -@Override -public Double evaluateUnit(Query query, Float predicted, Float actual) { - logger.info("Q: " + query.toString() + " P: " + predicted + " A: " + actual); - // return squared error - double error; - if (predicted.isNaN()) - error = -actual; - else - error = predicted - actual; - return (error * error); -} -``` - -The method `evaluateSet()` takes all of the *Evaluator Unit (EU)* of the same set to -compute the *Evaluator Result (ER)* for this set. - -For this RMSE evaluator, `evaluateSet()` calculates the square root mean of all -square errors of the same set and then return it. - -```java -@Override -public Double evaluateSet(Object dataParams, Iterable<Double> evaluationUnits) { - double sum = 0.0; - int count = 0; - for (double squareError : metricUnits) { - sum += squareError; - count += 1; - } - return Math.sqrt(sum / count); -} -``` - -The method `evaluateAll()` takes the *Evaluator Results* of all sets to do -a final computation and returns *Multiple Evaluator Result*. PredictionIO will -display this final *Multiple Evaluator Result* in the terminal. - -In this tutorial, it simply combines all *Evaluator Results* as String and return -it. - -```java -@Override -public String evaluateAll(Iterable<Tuple2<Object, Double>> input) { - return Arrays.toString(IteratorUtils.toArray(input.iterator())); -} -``` - -## Step 3 - Run Evaluation - -To run evaluation with metric, simply add the `Evaluator` class to the -`runEngine()` in `JavaWorkflow.runEngine()` (as shown in `Runner3.java`). - -Because our `Evaluator` class doesn't take parameter, `EmptyParams` class is used. - -```java -JavaWorkflow.runEngine( - (new EngineFactory()).apply(), - engineParams, - Evaluator.class, - new EmptyParams(), - new WorkflowParamsBuilder().batch("MyEngine").verbose(3).build() -); -``` - -Execute the following command: - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial3.Runner3 -- -- data/test/ratings.csv -``` -where `$PIO_HOME` is the root directory of the PredictionIO code tree. - -You should see the following output when it finishes running. - -``` -2014-09-30 16:53:55,924 INFO workflow.CoreWorkflow$ - CoreWorkflow.run completed. -2014-09-30 16:53:56,044 WARN workflow.CoreWorkflow$ - java.lang.String is not a NiceRendering instance. -2014-09-30 16:53:56,053 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: nU81XwpjSl-F43-CHgwJZQ -``` - -To view the Evaluator Result (RMSE score), start the dashboard with the `pio dashboard` command: - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio dashboard -``` - -Then point your browser to `localhost:9000` to view the result. You should see the result - -``` -[(null,1.0), (null,3.8078865529319543), (null,1.5811388300841898)] -``` -in the page. - -## Step 4 - Running with MovieLens 100K data set: - -Run the following to fetch the data set, if you haven't already done so. -The `ml-100k` will be downloaded into the `data/` directory. - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ./fetch.sh -``` - -Re-run `Runner3` with the `ml-100k` data set: - -``` -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial3.Runner3 -- -- `pwd`/data/ml-100k/u.data -``` - -You should see the following output when it finishes running. - -``` -2014-09-30 17:06:34,033 INFO spark.SparkContext - Job finished: collect at Workflow.scala:597, took 0.103821 s -2014-09-30 17:06:34,033 INFO workflow.CoreWorkflow$ - DataSourceParams: org.apache.predictionio.examples.java.recommendations.tutorial1.DataSourceParams@3b9f69ce -2014-09-30 17:06:34,033 INFO workflow.CoreWorkflow$ - PreparatorParams: Empty -2014-09-30 17:06:34,034 INFO workflow.CoreWorkflow$ - Algo: 0 Name: MyRecommendationAlgo Params: org.apache.predictionio.examples.java.recommendations.tutorial1.AlgoParams@76171b1 -2014-09-30 17:06:34,034 INFO workflow.CoreWorkflow$ - ServingParams: Empty -2014-09-30 17:06:34,035 INFO workflow.CoreWorkflow$ - EvaluatorParams: Empty -2014-09-30 17:06:34,035 INFO workflow.CoreWorkflow$ - [(null,1.052046904037191), (null,1.042766938101085), (null,1.0490312745374106)] -2014-09-30 17:06:34,035 INFO workflow.CoreWorkflow$ - CoreWorkflow.run completed. -2014-09-30 17:06:34,152 WARN workflow.CoreWorkflow$ - java.lang.String is not a NiceRendering instance. -2014-09-30 17:06:34,160 INFO workflow.CoreWorkflow$ - Saved engine instance with ID: IjWc8yyDS3-9JyXGVuWVgQ -``` - -To view the Evaluator Result (RMSE score), start the dashboard with the `pio -dashboard` command: - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio dashboard -``` - -Then point your browser to http://localhost:9000 to view the result. You -should see the result - -``` -[(null,1.052046904037191), (null,1.042766938101085), (null,1.0490312745374106)] -``` -in the page. - -Up to this point, you should be familiar with basic components of PredictionIO -(*DataSource*, *Algorithm* and *Evaluator*) and know how to develop your -algorithms and prediction engines, deploy them and serve real time prediction -queries. - -In the next tutorial, we will demonstrate how to use *Preparator* to do -pre-processing of *Training Data* for the *Algorithm*, incorporate multiple -*Algorithms* into the *Engine* and create a custom *Serving* component. - -Next: [Combining Multiple Algorithms at Serving](combiningalgorithms.html) http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/index.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/index.html.md b/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/index.html.md deleted file mode 100644 index 7e3d9d2..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/index.html.md +++ /dev/null @@ -1,40 +0,0 @@ ---- -title: Step-by-Step Engine Building ---- - -# Step-by-Step Engine Building - -## Overview - -These series of tutorials will walk through each components of **PredictionIO**. -We will demonstrate how to develop your machine learning algorithms and -prediction engines, deploy them and serve real time prediction queries, develop -your metrics to run offline evaluations, and improve prediction engine by using -multiple algorithms. - -> You need to build PredictionIO from source in order to build your own engine. -Please follow instructions to build from source -[here](/install/install-sourcecode.html). - -Let's build a simple **Java single machine recommendation engine** which -predicts item's rating value rated by the user. [MovieLens -100k](http://grouplens.org/datasets/movielens/) data set will be used as an -example. - -Execute the following command to download MovieLens 100k to `data/ml-100k/`. - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ./fetch.sh -``` -where `$PIO_HOME` is the root directory of the PredictionIO code tree. - -In this first tutorial, we will demonstrate how to build an simple Item -Recommendation Engine with the *DataSource* and *Algorithm* components. You can -find all sources code of this tutorial in the directory -`java-local-tutorial/src/main/java/recommendations/tutorial1/`. - -## Getting Started - -Let's begin with [implementing a new Engine with Data and -Algorithm](dataalgorithm.html). http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/testcomponents.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/testcomponents.html.md b/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/testcomponents.html.md deleted file mode 100644 index faea9eb..0000000 --- a/docs/manual/obsolete/tutorials/enginebuilders/stepbystep/testcomponents.html.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: Testing Engine Components ---- - -# Testing Engine Components - -During development, you may want to run each component step by step to test out -the data pipeline. In this tutorial, we will demonstrate how to do it easily. - -## Test Run DataSource - -In `java-local-tutorial/src/main/java/recommendations/tutorial2`, you can find -`Runner1.java`. It is a small program that uses `JavaSimpleEngineBuilder` to -build an engine and uses `JavaWorkflow` to run the workflow. - -To test the *DataSource* component, we can simply create an Engine with the -*DataSource* component only and leave other components empty: - -```java -private static class HalfBakedEngineFactory implements IJavaEngineFactory { - public JavaSimpleEngine<TrainingData, Object, Query, Float, Object> apply() { - return new JavaSimpleEngineBuilder< - TrainingData, Object, Query, Float, Object> () - .dataSourceClass(DataSource.class) - .build(); - } -} -``` -Similarly, we only need to add the `DataSourceParams` to -`JavaEngineParamsBuilder`. - -```java -JavaEngineParams engineParams = new JavaEngineParamsBuilder() - .dataSourceParams(new DataSourceParams(filePath)) - .build(); -``` - -Then, you can run this Engine by using `JavaWorkflow`. - -```java - JavaWorkflow.runEngine( - (new HalfBakedEngineFactory()).apply(), - engineParams, - null, - new EmptyParams(), - new WorkflowParamsBuilder().batch("MyEngine").verbose(3).build() - ); -``` - -For quick testing purpose, a very simple test data is provided in -`data/test/ratings.csv`. Each row of the file represents user ID, item ID, and -the rating value: - -``` -1,1,2 -1,2,3 -1,3,4 -... -``` - -The `Runner1.java` takes the path of the rating file as argument. Execute the -following command to run (The `../../bin/pio run` command will automatically -compile and package the JARs): - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial2.Runner1 -- -- data/test/ratings.csv -``` -where `$PIO_HOME` is the root directory of the PredictionIO code tree. The two -`--` are to separate parameters passed to `pio run` (the `Runner1` class in this -case`), parameters passed to Apache Spark (no special parameters in this case), -and parameters passed to the main class (the CSV file in this case). - -If it runs successfully, you should see the following console output at the end. -It prints out the `TrainingData` generated by `DataSource`. - -``` -2014-09-30 16:00:01,321 INFO spark.SparkContext - Job finished: collect at Workflow.scala:388, took 0.024613 s -2014-09-30 16:00:01,322 INFO workflow.CoreWorkflow$ - Data Set 0 -2014-09-30 16:00:01,323 INFO workflow.CoreWorkflow$ - Params: null -2014-09-30 16:00:01,323 INFO workflow.CoreWorkflow$ - TrainingData: -2014-09-30 16:00:01,323 INFO workflow.CoreWorkflow$ - [[(1,1,2.0), (1,2,3.0), (1,3,4.0), (2,3,4.0), (2,4,1.0), (3,2,2.0), (3,3,1.0), (3,4,3.0), (4,1,5.0), (4,2,3.0), (4,4,2.0)]] -2014-09-30 16:00:01,324 INFO workflow.CoreWorkflow$ - TestingData: (count=0) -2014-09-30 16:00:01,324 INFO workflow.CoreWorkflow$ - Data source complete -2014-09-30 16:00:01,324 INFO workflow.CoreWorkflow$ - Preparator is null. Stop here -``` - -As you can see, it stops after running the *DataSource* component and it prints -out *Training Data* for debugging. - -## Test Run Algorithm - -By simply adding `addAlgorithmClass()` and `addAlgorithmParams()` in the -`JavaSimpleEngineBuilder` and `JavaEngineParamsBuilder`, you can test the -`Algorithm` class in the workflow as well, as shown in `Runner2.java`: - -```java -private static class HalfBakedEngineFactory implements IJavaEngineFactory { - public JavaSimpleEngine<TrainingData, Object, Query, Float, Object> apply() { - return new JavaSimpleEngineBuilder< - TrainingData, Object, Query, Float, Object> () - .dataSourceClass(DataSource.class) - .preparatorClass() // Use default Preparator - .addAlgorithmClass("MyRecommendationAlgo", Algorithm.class) // Add Algorithm - .build(); - } -} -``` - -```java -JavaEngineParams engineParams = new JavaEngineParamsBuilder() - .dataSourceParams(new DataSourceParams(filePath)) - .addAlgorithmParams("MyRecommendationAlgo", new AlgoParams(0.2)) // Add Algorithm Params - .build(); -``` - -Execute the following command to run: - -``` -$ cd $PIO_HOME/examples/java-local-tutorial -$ ../../bin/pio run org.apache.predictionio.examples.java.recommendations.tutorial2.Runner2 -- -- data/test/ratings.csv -``` - -You should see the *Model* generated by the Algorithm at the end of the console -output: - -``` -2014-09-30 16:05:54,275 INFO spark.SparkContext - Job finished: collect at WorkflowUtils.scala:179, took 0.037635 s -2014-09-30 16:05:54,276 INFO workflow.CoreWorkflow$ - [Model: [itemSimilarity: {1=org.apache.commons.math3.linear.OpenMapRealVector@65fa6c0, 2=org.apache.commons.math3.linear.OpenMapRealVector@c2eb7f66, 3=org.apache.commons.math3.linear.OpenMapRealVector@2302395e, 4=org.apache.commons.math3.linear.OpenMapRealVector@d2fb7858}] -[userHistory: {1=org.apache.commons.math3.linear.OpenMapRealVector@5a1123a3, 2=org.apache.commons.math3.linear.OpenMapRealVector@d1225bfd, 3=org.apache.commons.math3.linear.OpenMapRealVector@572123a3, 4=org.apache.commons.math3.linear.OpenMapRealVector@a51523a3}]] -2014-09-30 16:05:54,276 INFO workflow.CoreWorkflow$ - Serving is null. Stop here -``` - -By adding each component step by step, we can easily test and debug the data -pipeline. - -Next: [Evaluation](evaluation.html) http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/e1e71280/docs/manual/obsolete/tutorials/engines/index.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/obsolete/tutorials/engines/index.html.md b/docs/manual/obsolete/tutorials/engines/index.html.md deleted file mode 100644 index ae89a32..0000000 --- a/docs/manual/obsolete/tutorials/engines/index.html.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -title: Tutorials and Samples ---- - -# Engine Tutorials and Samples - - -## Item Recommendation - -### Rails -[Building a Business Recommendation App with Rails Using Yelp Data](itemrec/rails.html) - -### Python -[Building Movie Recommendation App with Sample Code](itemrec/movielens.html) - -## Item Ranking - -* (coming soon) - -## Item Similarity - -* (coming soon)
