http://git-wip-us.apache.org/repos/asf/predictionio-site/blob/765e178c/archived/tapster/index.html ---------------------------------------------------------------------- diff --git a/archived/tapster/index.html b/archived/tapster/index.html new file mode 100644 index 0000000..aa68c35 --- /dev/null +++ b/archived/tapster/index.html @@ -0,0 +1,269 @@ +<!DOCTYPE html><html><head><title>Comics Recommendation Demo</title><meta charset="utf-8"/><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta class="swiftype" name="title" data-type="string" content="Comics Recommendation Demo"/><link rel="canonical" href="https://predictionio.apache.org/archived/tapster/"/><link href="/images/favicon/normal-b330020a.png" rel="shortcut icon"/><link href="/images/favicon/apple-c0febcf2.png" rel="apple-touch-icon"/><link href="//fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800" rel="stylesheet"/><link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"/><link href="/stylesheets/application-eccfc6cb.css" rel="stylesheet" type="text/css"/><script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script><script src="//cdn.mathjax.org/mathja x/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><script src="//use.typekit.net/pqo0itb.js"></script><script>try{Typekit.load({ async: true });}catch(e){}</script></head><body><div id="global"><header><div class="container" id="header-wrapper"><div class="row"><div class="col-sm-12"><div id="logo-wrapper"><span id="drawer-toggle"></span><a href="#"></a><a href="http://predictionio.apache.org/"><img alt="Apache PredictionIO" id="logo" src="/images/logos/logo-ee2b9bb3.png"/></a><span>®</span></div><div id="menu-wrapper"><div id="pill-wrapper"><a class="pill left" href="/gallery/template-gallery">TEMPLATES</a> <a class="pill right" href="//github.com/apache/predictionio/">OPEN SOURCE</a></div></div><img class="mobile-search-bar-toggler hidden-md hidden-lg" src="/images/icons/search-glass-704bd4ff.png"/></div></div></div></header><div id="search-bar-row-wrapper"><div class="container-fluid" id="search-bar-row"><div class="row"><div class="col-md-9 col-sm-11 col-xs-11"><div cl ass="hidden-md hidden-lg" id="mobile-page-heading-wrapper"><p>PredictionIO Docs</p><h4>Comics Recommendation Demo</h4></div><h4 class="hidden-sm hidden-xs">PredictionIO Docs</h4></div><div class="col-md-3 col-sm-1 col-xs-1 hidden-md hidden-lg"><img id="left-menu-indicator" src="/images/icons/down-arrow-dfe9f7fe.png"/></div><div class="col-md-3 col-sm-12 col-xs-12 swiftype-wrapper"><div class="swiftype"><form class="search-form"><img class="search-box-toggler hidden-xs hidden-sm" src="/images/icons/search-glass-704bd4ff.png"/><div class="search-box"><img src="/images/icons/search-glass-704bd4ff.png"/><input type="text" id="st-search-input" class="st-search-input" placeholder="Search Doc..."/></div><img class="swiftype-row-hider hidden-md hidden-lg" src="/images/icons/drawer-toggle-active-fcbef12a.png"/></form></div></div><div class="mobile-left-menu-toggler hidden-md hidden-lg"></div></div></div></div><div id="page" class="container-fluid"><div class="row"><div id="left-menu-wrapper" class="col-md-3"><nav id="nav-main"><ul><li class="level-1"><a class="expandible" href="/"><span>Apache PredictionIO® Documentation</span></a><ul><li class="level-2"><a class="final" href="/"><span>Welcome to Apache PredictionIO®</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Getting Started</span></a><ul><li class="level-2"><a class="final" href="/start/"><span>A Quick Intro</span></a></li><li class="level-2"><a class="final" href="/install/"><span>Installing Apache PredictionIO</span></a></li><li class="level-2"><a class="final" href="/start/download/"><span>Downloading an Engine Template</span></a></li><li class="level-2"><a class="final" href="/start/deploy/"><span>Deploying Your First Engine</span></a></li><li class="level-2"><a class="final" href="/start/customize/"><span>Customizing the Engine</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Integrating with Your App</span></a><ul><li class="level-2"><a cl ass="final" href="/appintegration/"><span>App Integration Overview</span></a></li><li class="level-2"><a class="expandible" href="/sdk/"><span>List of SDKs</span></a><ul><li class="level-3"><a class="final" href="/sdk/java/"><span>Java & Android SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/php/"><span>PHP SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/python/"><span>Python SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/ruby/"><span>Ruby SDK</span></a></li><li class="level-3"><a class="final" href="/community/projects/#sdks"><span>Community Powered SDKs</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Deploying an Engine</span></a><ul><li class="level-2"><a class="final" href="/deploy/"><span>Deploying as a Web Service</span></a></li><li class="level-2"><a class="final" href="/batchpredict/"><span>Batch Predictions</span></a></li><li class="level-2"><a class="final" href="/deploy/ monitoring/"><span>Monitoring Engine</span></a></li><li class="level-2"><a class="final" href="/deploy/engineparams/"><span>Setting Engine Parameters</span></a></li><li class="level-2"><a class="final" href="/deploy/enginevariants/"><span>Deploying Multiple Engine Variants</span></a></li><li class="level-2"><a class="final" href="/deploy/plugin/"><span>Engine Server Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Customizing an Engine</span></a><ul><li class="level-2"><a class="final" href="/customize/"><span>Learning DASE</span></a></li><li class="level-2"><a class="final" href="/customize/dase/"><span>Implement DASE</span></a></li><li class="level-2"><a class="final" href="/customize/troubleshooting/"><span>Troubleshooting Engine Development</span></a></li><li class="level-2"><a class="final" href="/api/current/#package"><span>Engine Scala APIs</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Collecting and Analyzing Data</span></a><ul><li class="level-2"><a class="final" href="/datacollection/"><span>Event Server Overview</span></a></li><li class="level-2"><a class="final" href="/datacollection/eventapi/"><span>Collecting Data with REST/SDKs</span></a></li><li class="level-2"><a class="final" href="/datacollection/eventmodel/"><span>Events Modeling</span></a></li><li class="level-2"><a class="final" href="/datacollection/webhooks/"><span>Unifying Multichannel Data with Webhooks</span></a></li><li class="level-2"><a class="final" href="/datacollection/channel/"><span>Channel</span></a></li><li class="level-2"><a class="final" href="/datacollection/batchimport/"><span>Importing Data in Batch</span></a></li><li class="level-2"><a class="final" href="/datacollection/analytics/"><span>Using Analytics Tools</span></a></li><li class="level-2"><a class="final" href="/datacollection/plugin/"><span>Event Server Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#">< span>Choosing an Algorithm</span></a><ul><li class="level-2"><a class="final" href="/algorithm/"><span>Built-in Algorithm Libraries</span></a></li><li class="level-2"><a class="final" href="/algorithm/switch/"><span>Switching to Another Algorithm</span></a></li><li class="level-2"><a class="final" href="/algorithm/multiple/"><span>Combining Multiple Algorithms</span></a></li><li class="level-2"><a class="final" href="/algorithm/custom/"><span>Adding Your Own Algorithms</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Tuning and Evaluation</span></a><ul><li class="level-2"><a class="final" href="/evaluation/"><span>Overview</span></a></li><li class="level-2"><a class="final" href="/evaluation/paramtuning/"><span>Hyperparameter Tuning</span></a></li><li class="level-2"><a class="final" href="/evaluation/evaluationdashboard/"><span>Evaluation Dashboard</span></a></li><li class="level-2"><a class="final" href="/evaluation/metricchoose/"><span>Choosing Ev aluation Metrics</span></a></li><li class="level-2"><a class="final" href="/evaluation/metricbuild/"><span>Building Evaluation Metrics</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>System Architecture</span></a><ul><li class="level-2"><a class="final" href="/system/"><span>Architecture Overview</span></a></li><li class="level-2"><a class="final" href="/system/anotherdatastore/"><span>Using Another Data Store</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>PredictionIO® Official Templates</span></a><ul><li class="level-2"><a class="final" href="/templates/"><span>Intro</span></a></li><li class="level-2"><a class="expandible" href="#"><span>Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/recommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="fin al" href="/templates/recommendation/evaluation/"><span>Evaluation Explained</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/reading-custom-events/"><span>Read Custom Events</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-data-prep/"><span>Customize Data Preparator</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-serving/"><span>Customize Serving</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/training-with-implicit-preference/"><span>Train with Implicit Preference</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/blacklist-items/"><span>Filter Recommended Items by Blacklist in Query</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/batch-evaluator/"><s pan>Batch Persistable Evaluator</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>E-Commerce Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/adjust-score/"><span>Adjust Score</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Similar Product</span></a><ul><li class="level-3"><a class="final" href="/templates/similarproduct/quickstart/"><span>Quick Start</span></a></li><li class ="level-3"><a class="final" href="/templates/similarproduct/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/multi-events-multi-algos/"><span>Multiple Events and Multiple Algorithms</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/return-item-properties/"><span>Returns Item Properties</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/rid-user-set-event/"><span>Get Rid of Events for Users</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/recommended-user/"><span>Recommend Users</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Classification</span></a><ul><li cl ass="level-3"><a class="final" href="/templates/classification/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/classification/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/classification/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/classification/add-algorithm/"><span>Use Alternative Algorithm</span></a></li><li class="level-3"><a class="final" href="/templates/classification/reading-custom-properties/"><span>Read Custom Properties</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Engine Template Gallery</span></a><ul><li class="level-2"><a class="final" href="/gallery/template-gallery/"><span>Browse</span></a></li><li class="level-2"><a class="final" href="/community/submit-template/"><span>Submit your Engine as a Template</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span> Demo Tutorials</span></a><ul><li class="level-2"><a class="final" href="/community/projects/#demos"><span>Community Contributed Demo</span></a></li><li class="level-2"><a class="final" href="/demo/textclassification/"><span>Text Classification Engine Tutorial</span></a></li></ul></li><li class="level-1"><a class="expandible" href="/community/"><span>Getting Involved</span></a><ul><li class="level-2"><a class="final" href="/community/contribute-code/"><span>Contribute Code</span></a></li><li class="level-2"><a class="final" href="/community/contribute-documentation/"><span>Contribute Documentation</span></a></li><li class="level-2"><a class="final" href="/community/contribute-sdk/"><span>Contribute a SDK</span></a></li><li class="level-2"><a class="final" href="/community/contribute-webhook/"><span>Contribute a Webhook</span></a></li><li class="level-2"><a class="final" href="/community/projects/"><span>Community Projects</span></a></li></ul></li><li class="level-1"><a class="expandi ble" href="#"><span>Getting Help</span></a><ul><li class="level-2"><a class="final" href="/resources/faq/"><span>FAQs</span></a></li><li class="level-2"><a class="final" href="/support/"><span>Support</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Resources</span></a><ul><li class="level-2"><a class="final" href="/cli/"><span>Command-line Interface</span></a></li><li class="level-2"><a class="final" href="/resources/release/"><span>Release Cadence</span></a></li><li class="level-2"><a class="final" href="/resources/intellij/"><span>Developing Engines with IntelliJ IDEA</span></a></li><li class="level-2"><a class="final" href="/resources/upgrade/"><span>Upgrade Instructions</span></a></li><li class="level-2"><a class="final" href="/resources/glossary/"><span>Glossary</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Apache Software Foundation</span></a><ul><li class="level-2"><a class="final" href="https://www.apache. org/"><span>Apache Homepage</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/licenses/"><span>License</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/foundation/sponsorship.html"><span>Sponsorship</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/foundation/thanks.html"><span>Thanks</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/security/"><span>Security</span></a></li></ul></li></ul></nav></div><div class="col-md-9 col-sm-12"><div class="content-header hidden-md hidden-lg"><div id="page-title"><h1>Comics Recommendation Demo</h1></div></div><div id="table-of-content-wrapper"><h5>On this page</h5><aside id="table-of-contents"><ul> <li> <a href="#introduction">Introduction</a> </li> <li> <a href="#tapster-demo-application">Tapster Demo Application</a> </li> <li> <a href="#apache-predictionio-setup">Apache PredictionIO Setup</a> </li> <li> <a href="#import-d ata">Import Data</a> </li> <li> <a href="#connect-demo-app-with-apache-predictionio">Connect Demo app with Apache PredictionIO</a> </li> <li> <a href="#links">Links</a> </li> <li> <a href="#conclusion">Conclusion</a> </li> </ul> </aside><hr/><a id="edit-page-link" href="https://github.com/apache/predictionio/tree/livedoc/docs/manual/source/archived/tapster.html.md"><img src="/images/icons/edit-pencil-d6c1bb3d.png"/>Edit this page</a></div><div class="content-header hidden-sm hidden-xs"><div id="page-title"><h1>Comics Recommendation Demo</h1></div></div><div class="content"> <h2 id='introduction' class='header-anchors'>Introduction</h2><p>In this demo, we will show you how to build a Tinder-style web application (named "Tapster") recommending comics to users based on their likes/dislikes of episodes interactively.</p><p>The demo will use <a href="https://predictionio.apache.org/templates/similarproduct/quickstart/">Similar Product Template</a>. Similar Product Template is a great choice if you want to make recommendations based on immediate user activities or for new users with limited history. It uses MLLib Alternating Least Squares (ALS) recommendation algorithm, a <a href="http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering">Collaborative filtering</a> (CF) algorithm commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. Users and products are described by a small set of latent factors that can be used to predict missing entries. A layman's interpretation of Collaborative Filtering is "People who like this comic, also like these comics."</p><p>All the code and data is on GitHub at: <a href="https://github.com/PredictionIO/Demo-Tapster">github.com/PredictionIO/Demo-Tapster</a>.</p><h3 id='data' class='header-anchors'>Data</h3><p>The source of the data is from <a href="http://tapastic.com/">Tapastic</a>. You can find the data files <a href="https: //github.com/PredictionIO/Demo-Tapster/tree/master/data">here</a>.</p><p>The data structure looks like this:</p><p><a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/data/episode_list.csv">Episode List</a> <code>data/episode_list.csv</code></p><p><strong>Fields:</strong> episodeId | episodeTitle | episodeCategories | episodeUrl | episodeImageUrls</p><p>1,000 rows. Each row represents one episode.</p><p><a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/data/user_list.csv">User Like Event List</a> <code>data/user_list.csv</code></p><p><strong>Fields:</strong> userId | episodeId | likedTimestamp</p><p>192,587 rows. Each row represents one user like for the given episode.</p><p>The tutorial has four major steps:</p> <ul> <li>Demo application setup</li> <li>PredictionIO installation and setup</li> <li>Import data into database and PredictionIO</li> <li>Integrate demo application with PredictionIO</li> </ul> <h2 id='tapster-demo-application' class='header-an chors'>Tapster Demo Application</h2><p>The demo application is built using Rails.</p><p>You can clone the existing application with:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3</pre></td><td class="code"><pre><span class="gp">$ </span>git clone https://github.com/PredictionIO/Demo-Tapster.git +<span class="gp">$ </span><span class="nb">cd </span>Demo-Tapster +<span class="gp">$ </span>bundle install +</pre></td></tr></tbody></table> </div> <p>You will need to edit <code>config/database.yml</code> to match your local database settings. We have provided some sensible defaults for PostgreSQL, MySQL, and SQLite.</p><p>Setup the database with:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2</pre></td><td class="code"><pre><span class="gp">$ </span>rake db:create +<span class="gp">$ </span>rake db:migrate +</pre></td></tr></tbody></table> </div> <p>At this point, you should have the demo application ready but with an empty database. Lets import the episodes data into our database. We will do this with: <code>$ rake import:episodes</code>. An "Episode" is a single <a href="http://en.wikipedia.org/wiki/Comic_strip">comic strip</a>.</p><p><a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/lib/tasks/import/episodes.rake">View on GitHub</a></p><p>This script is pretty simple. It loops through the CSV file and creates a new episode for each line in the file in our local database.</p><p>You can start the app and point your browser to <a href="http://localhost:3000">http://localhost:3000</a></p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1</pre></td><td class="code"><pre><span class="nv">$rails</span> server +</pre></td></tr></tbody></table> </div> <p><img alt="Rails Server" src="/images/demo/tapster/rails-server-997d690e.png"/></p><h2 id='apache-predictionio-setup' class='header-anchors'>Apache PredictionIO Setup</h2><h3 id='install-apache-predictionio' class='header-anchors'>Install Apache PredictionIO</h3><p>Follow the installation instructions <a href="http://predictionio.apache.org/install/">here</a> or simply run:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1</pre></td><td class="code"><pre><span class="gp">$ </span>bash -c <span class="s2">"</span><span class="k">$(</span>curl -s https://raw.githubusercontent.com/apache/predictionio/master/bin/install.sh<span class="k">)</span><span class="s2">"</span> +</pre></td></tr></tbody></table> </div> <p><img alt="PIO Install" src="/images/demo/tapster/pio-install-2d870aed.png"/></p><h3 id='create-a-new-app' class='header-anchors'>Create a New App</h3><p>You will need to create a new app on Apache PredictionIO to house the Tapster demo. You can do this with:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1</pre></td><td class="code"><pre><span class="gp">$ </span>pio app new tapster +</pre></td></tr></tbody></table> </div> <p>Take note of the App ID and Access Key.</p><p><img alt="PIO App New" src="/images/demo/tapster/pio-app-new-5a8ae503.png"/></p><h3 id='setup-engine' class='header-anchors'>Setup Engine</h3><p>We are going to copy the Similar Product Template into the PIO directory.</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2</pre></td><td class="code"><pre><span class="gp">$ </span><span class="nb">cd </span>PredictionIO +<span class="gp">$ </span>git clone https://github.com/apache/predictionio-template-similar-product.git tapster-episode-similar +</pre></td></tr></tbody></table> </div> <p>Next we are going to update the App ID in the âengine.jsonâ file to match the App ID we just created.</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3</pre></td><td class="code"><pre><span class="gp">$ </span><span class="nb">cd </span>tapster-episode-similar +<span class="gp">$ </span>nano engine.json +<span class="gp">$ </span><span class="nb">cd</span> .. +</pre></td></tr></tbody></table> </div> <p><img alt="Engine Setup" src="/images/demo/tapster/pio-engine-setup-88e25cc0.png"/></p><h3 id='modify--engine-template' class='header-anchors'>Modify Engine Template</h3><p>By the default, the engine template reads the âviewâ events. We can easily to change it to read âlikeâ events.</p> <p>Modify <code>readTraining()</code> in DataSource.scala:</p><div class="highlight scala"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36</pre></td><td class="code"><pre> + <span class="k">override</span> + <span class="k">def</span> <span class="n">readTraining</span><span class="o">(</span><span class="n">sc</span><span class="k">:</span> <span class="kt">SparkContext</span><span class="o">)</span><span class="k">:</span> <span class="kt">TrainingData</span> <span class="o">=</span> <span class="o">{</span> + + <span class="o">...</span> + + <span class="k">val</span> <span class="n">viewEventsRDD</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">ViewEvent</span><span class="o">]</span> <span class="k">=</span> <span class="n">eventsDb</span><span class="o">.</span><span class="n">find</span><span class="o">(</span> + <span class="n">appId</span> <span class="k">=</span> <span class="n">dsp</span><span class="o">.</span><span class="n">appId</span><span class="o">,</span> + <span class="n">entityType</span> <span class="k">=</span> <span class="nc">Some</span><span class="o">(</span><span class="s">"user"</span><span class="o">),</span> + <span class="n">eventNames</span> <span class="k">=</span> <span class="nc">Some</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="s">"like"</span><span class="o">)),</span> <span class="c1">// MODIFIED +</span> <span class="c1">// targetEntityType is optional field of an event. +</span> <span class="n">targetEntityType</span> <span class="k">=</span> <span class="nc">Some</span><span class="o">(</span><span class="nc">Some</span><span class="o">(</span><span class="s">"item"</span><span class="o">)))(</span><span class="n">sc</span><span class="o">)</span> + <span class="c1">// eventsDb.find() returns RDD[Event] +</span> <span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">event</span> <span class="k">=></span> + <span class="k">val</span> <span class="n">viewEvent</span> <span class="k">=</span> <span class="k">try</span> <span class="o">{</span> + <span class="n">event</span><span class="o">.</span><span class="n">event</span> <span class="k">match</span> <span class="o">{</span> + <span class="k">case</span> <span class="s">"like"</span> <span class="k">=></span> <span class="nc">ViewEvent</span><span class="o">(</span> <span class="c1">// MODIFIED +</span> <span class="n">user</span> <span class="k">=</span> <span class="n">event</span><span class="o">.</span><span class="n">entityId</span><span class="o">,</span> + <span class="n">item</span> <span class="k">=</span> <span class="n">event</span><span class="o">.</span><span class="n">targetEntityId</span><span class="o">.</span><span class="n">get</span><span class="o">,</span> + <span class="n">t</span> <span class="k">=</span> <span class="n">event</span><span class="o">.</span><span class="n">eventTime</span><span class="o">.</span><span class="n">getMillis</span><span class="o">)</span> + <span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="k">throw</span> <span class="k">new</span> <span class="nc">Exception</span><span class="o">(</span><span class="n">s</span><span class="s">"Unexpected event ${event} is read."</span><span class="o">)</span> + <span class="o">}</span> + <span class="o">}</span> <span class="k">catch</span> <span class="o">{</span> + <span class="k">case</span> <span class="n">e</span><span class="k">:</span> <span class="kt">Exception</span> <span class="o">=></span> <span class="o">{</span> + <span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="o">(</span><span class="n">s</span><span class="s">"Cannot convert ${event} to ViewEvent."</span> <span class="o">+</span> + <span class="n">s</span><span class="s">" Exception: ${e}."</span><span class="o">)</span> + <span class="k">throw</span> <span class="n">e</span> + <span class="o">}</span> + <span class="o">}</span> + <span class="n">viewEvent</span> + <span class="o">}</span> + + <span class="o">...</span> + <span class="o">}</span> +<span class="o">}</span> + +</pre></td></tr></tbody></table> </div> <p>Finally to build the engine we will run:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3</pre></td><td class="code"><pre><span class="gp">$ </span><span class="nb">cd </span>tapster-episode-similar +<span class="gp">$ </span>pio build +<span class="gp">$ </span><span class="nb">cd</span> .. +</pre></td></tr></tbody></table> </div> <p><img alt="PIO Build" src="/images/demo/tapster/pio-build-e6eb1d7c.png"/></p><h2 id='import-data' class='header-anchors'>Import Data</h2><p>Once everything is installed, start the event server by running: <code>$ pio eventserver</code></p><p><img alt="Event Server" src="/images/demo/tapster/pio-eventserver-88889ec0.png"/></p><div class="alert-message info"><p>You can check the status of Apache PredictionIO at any time by running: <code>$ pio status</code></p></div><p>ALERT: If your laptop goes to sleep you might manually need to restart HBase with:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3</pre></td><td class="code"><pre><span class="gp">$ </span><span class="nb">cd </span>PredictionIO/venders/hbase-0.98.6/bin +<span class="gp">$ </span>./stop-hbase.sh +<span class="gp">$ </span>./start-hbase.sh +</pre></td></tr></tbody></table> </div> <p>The key event we are importing into Apache PredictionIO event server is the "Like" event (for example, user X likes episode Y).</p><p>We will send this data to Apache PredictionIO by executing <code>$ rake import:predictionio</code> command.</p><p><a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/lib/tasks/import/predictionio.rake">View on GitHub</a></p><p>This script is a little more complex. First we need to connect to the Event Server.</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1</pre></td><td class="code"><pre>client <span class="o">=</span> PredictionIO::EventClient.new<span class="o">(</span>ENV[<span class="s1">'PIO_ACCESS_KEY'</span><span class="o">]</span>, ENV[<span class="s1">'PIO_EVENT_SERVER_URL'</span><span class="o">]</span>, THREADS<span class="o">)</span> +</pre></td></tr></tbody></table> </div> <p>You will need to create the environmental variables <code>PIO_ACCESS_KEY</code> and <code>PIO_EVENT_SERVER_URL</code>. The default Event Server URL is: <a href="http://localhost:7070">http://localhost:7070</a>.</p><div class="alert-message info"><p>If you forget your <strong>Access Key</strong> you can always run: <code>$ pio app list</code></p></div><p>You can set these values in the <code>.env</code> file located in the application root directory and it will be automatically loaded into your environment each time Rails is run.</p><p>The next part of the script loops through each line of the <code>data/user_list.csv</code> file and returns an array of unique user and episode IDs. Once we have those we can send the data to Apache PredictionIO like this.</p><p>First the users:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5</pre></td><td class="code"><pre>user_ids.each_with_index <span class="k">do</span> |id, i| + <span class="c"># Send unique user IDs to PredictionIO.</span> + client.aset_user<span class="o">(</span>id<span class="o">)</span> + puts <span class="s2">"Sent user ID #{id} to PredictionIO. Action #{i + 1} of #{user_count}"</span> +end +</pre></td></tr></tbody></table> </div> <p>And now the episodes:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17</pre></td><td class="code"><pre>episode_ids.each_with_index <span class="k">do</span> |id, i| + <span class="c"># Load episode from database - we will need this to include the categories!</span> + episode <span class="o">=</span> Episode.where<span class="o">(</span>episode_id: id<span class="o">)</span>.take + + <span class="k">if </span>episode + <span class="c"># Send unique episode IDs to PredictionIO.</span> + client.acreate_event<span class="o">(</span> + <span class="s1">'$set'</span>, + <span class="s1">'item'</span>, + id, + properties: <span class="o">{</span> categories: episode.categories <span class="o">}</span> + <span class="o">)</span> + puts <span class="s2">"Sent episode ID #{id} to PredictionIO. Action #{i + 1} of #{episode_count}"</span> + <span class="k">else + </span>puts <span class="s2">"Episode ID #{id} not found in database! Skipping!"</span>.color<span class="o">(</span>:red<span class="o">)</span> + end +end +</pre></td></tr></tbody></table> </div> <p>Finally we loop through the <code>data/user_list.csv</code> file a final time to send the like events:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14</pre></td><td class="code"><pre>CSV.foreach<span class="o">(</span>USER_LIST, headers: <span class="nb">true</span><span class="o">)</span> <span class="k">do</span> |row| + user_id <span class="o">=</span> row[0] <span class="c"># userId</span> + episode_id <span class="o">=</span> row[1] <span class="c"># episodeId</span> + + <span class="c"># Send like to PredictionIO.</span> + client.acreate_event<span class="o">(</span> + <span class="s1">'like'</span>, + <span class="s1">'user'</span>, + user_id, + <span class="o">{</span> <span class="s1">'targetEntityType'</span> <span class="o">=</span>> <span class="s1">'item'</span>, <span class="s1">'targetEntityId'</span> <span class="o">=</span>> episode_id <span class="o">}</span> + <span class="o">)</span> + + puts <span class="s2">"Sent user ID #{user_id} liked episode ID #{episode_id} to PredictionIO. Action #{</span><span class="nv">$INPUT_LINE_NUMBER</span><span class="s2">} of #{line_count}."</span> +end +</pre></td></tr></tbody></table> </div> <p>In total the script takes about 4 minutes to run on a basic laptop. At this point all the data is now imported to Apache PredictionIO.</p><p><img alt="Import" src="/images/demo/tapster/pio-import-predictionio-1ecd11fd.png"/></p><h3 id='engine-training' class='header-anchors'>Engine Training</h3><p>We train the engine with the following command:</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2</pre></td><td class="code"><pre><span class="gp">$ </span><span class="nb">cd </span>tapster-episode-similar +<span class="gp">$ </span>pio train -- --driver-memory 4g +</pre></td></tr></tbody></table> </div> <p><img alt="PIO Train" src="/images/demo/tapster/pio-train-7edffad4.png"/></p><p>Using the --driver-memory option to limit the memory used by Apache PredictionIO. Without this Apache PredictionIO can consume too much memory leading to a crash. You can adjust the 4g up or down depending on your system specs.</p><p>You can set up a job to periodically retrain the engine so the model is updated with the latest dataset.</p><h3 id='deploy-model' class='header-anchors'>Deploy Model</h3><p>You can deploy the model with: <code>$ pio deploy</code> from the <code>tapster-episode-similar</code> directory.</p><p>At this point, you have an demo app with data and a Apache PredictionIO server with a trained model all setup. Next, we will connect the two so you can log the live interaction (likes) events into Apache PredictionIO event server and query the engine server for recommendation.</p><h2 id='connect-demo-app-with-apache-predictionio' class='header-an chors'>Connect Demo app with Apache PredictionIO</h2><h3 id='overview' class='header-anchors'>Overview</h3><p>On a high level the application keeps a record of each like and dislike. It uses jQuery to send an array of both likes and dislikes to the server on each click. The server then queries Apache PredictionIO for a similar episode which is relayed to jQuery and displayed to the user.</p><p>Data flow:</p> <ul> <li>The user likes an episode.</li> <li>Tapster sends the "Like" event to Apache PredictionIO event server.</li> <li>Tapster queries Apache PredictionIO engine with all the episodes the user has rated (likes and dislikes) in this session.</li> <li>Apache PredictionIO returns 1 recommended episode.</li> </ul> <h3 id='javascript' class='header-anchors'>JavaScript</h3><p>All the important code lives in <code>app/assets/javascripts/application.js</code> <a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/app/assets/javascripts/application.js">View on Git Hub</a></p><p>Most of this file is just handlers for click things, displaying the loading dialog and other such things.</p><p>The most important function is to query the Rails server for results from Apache PredictionIO.</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14</pre></td><td class="code"><pre>// Query the server <span class="k">for </span>a comic based on previous likes. See episodes#query. +queryPIO: <span class="k">function</span><span class="o">()</span> <span class="o">{</span> + var _this <span class="o">=</span> this; // For closure. + <span class="nv">$.</span>ajax<span class="o">({</span> + url: <span class="s1">'/episodes/query'</span>, + <span class="nb">type</span>: <span class="s1">'POST'</span>, + data: <span class="o">{</span> + likes: JSON.stringify<span class="o">(</span>_this.likes<span class="o">)</span>, + dislikes: JSON.stringify<span class="o">(</span>_this.dislikes<span class="o">)</span>, + <span class="o">}</span> + <span class="o">})</span>.done<span class="o">(</span><span class="k">function</span><span class="o">(</span>data<span class="o">)</span> <span class="o">{</span> + _this.setComic<span class="o">(</span>data<span class="o">)</span>; + <span class="o">})</span>; +<span class="o">}</span> +</pre></td></tr></tbody></table> </div> <h3 id='rails' class='header-anchors'>Rails</h3><p>On the Rails side all the fun things happen in the episodes controller located at: <code>app/controllers/episodes_controller</code> <a href="https://github.com/PredictionIO/Demo-Tapster/blob/master/app/controllers/episodes_controller.rb">View on GitHub</a>.</p><div class="highlight shell"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32</pre></td><td class="code"><pre>def query + <span class="c"># Create PredictionIO client.</span> + client <span class="o">=</span> PredictionIO::EngineClient.new<span class="o">(</span>ENV[<span class="s1">'PIO_ENGINE_URL'</span><span class="o">])</span> + + <span class="c"># Get posted likes and dislikes.</span> + likes <span class="o">=</span> ActiveSupport::JSON.decode<span class="o">(</span>params[:likes]<span class="o">)</span> + dislikes <span class="o">=</span> ActiveSupport::JSON.decode<span class="o">(</span>params[:dislikes]<span class="o">)</span> + + <span class="k">if </span>likes.empty? + <span class="c"># We can't query PredictionIO with no likes so</span> + <span class="c"># we will return a random comic instead.</span> + @episode <span class="o">=</span> random_episode + + render json: @episode + <span class="k">return + </span>end + + <span class="c"># Query PredictionIO.</span> + <span class="c"># Here we black list the disliked items so they are not shown again!</span> + response <span class="o">=</span> client.send_query<span class="o">(</span>items: likes, blackList: dislikes, num: 1<span class="o">)</span> + + <span class="c"># With a real application you would want to do some</span> + <span class="c"># better sanity checking of the response here!</span> + + <span class="c"># Get ID of response.</span> + id <span class="o">=</span> response[<span class="s1">'itemScores'</span><span class="o">][</span>0][<span class="s1">'item'</span><span class="o">]</span> + + <span class="c"># Find episode in database.</span> + @episode <span class="o">=</span> Episode.where<span class="o">(</span>episode_id: id<span class="o">)</span>.take + + render json: @episode +end +</pre></td></tr></tbody></table> </div> <p>On the first line we make a connection to Apache PredictionIO. You will need to set the <code>PIO_ENGINE_URL</code>. This can be done in the <code>.env</code> file. The default URL is: <a href="http://localhost:8000">http://localhost:8000</a>.</p><p>Next we decode the JSON sent from the browser.</p><p>After that we check to see if the user has liked anything yet. If not we just return a random episode.</p><p>If the user has likes then we can send that data to Apache PredictionIO event server.</p><p>We also blacklist the dislikes so that they are not returned.</p><p>With our response from Apache PredictionIO itâs just a matter of looking it up in the database and rendering that object as JSON.</p><p>Once the response is sent to the browser JavaScript is used to replace the existing comic and hide the loading message.</p><p>Thats it. Youâre done! If Ruby is not your language of choice check out our other <a href="http://predictionio.apach e.org/sdk/">SDKs</a> and remember you can always interact with the Event Server though itâs native JSON API.</p><h2 id='links' class='header-anchors'>Links</h2><p>Source code is on GitHub at: <a href="https://github.com/PredictionIO/Demo-Tapster">github.com/PredictionIO/Demo-Tapster</a></p><h2 id='conclusion' class='header-anchors'>Conclusion</h2><p>Love this tutorial and Apache PredictionIO? Both are open source (Apache 2 License). <a href="https://github.com/PredictionIO/Demo-Tapster">Fork</a> this demo and build upon it. If you produce something cool shoot us an email and we will link to it from here.</p><p>Found a typo? Think something should be explained better? This tutorial (and all our other documentation) live in the main repo <a href="https://github.com/apache/predictionio/blob/livedoc/docs/manual/source/demo/tapster.html.md">here</a>. Our documentation is in the <code>livedoc</code> branch. Find out how to contribute documentation at <a href="http://predictionio.apache. org/community/contribute-documentation/">http://predictionio.apache.org/community/contribute-documentation/</a>].</p><p>We ♥ pull requests!</p></div></div></div></div><footer><div class="container"><div class="seperator"></div><div class="row"><div class="col-md-6 footer-link-column"><div class="footer-link-column-row"><h4>Community</h4><ul><li><a href="//predictionio.apache.org/install/" target="blank">Download</a></li><li><a href="//predictionio.apache.org/" target="blank">Docs</a></li><li><a href="//github.com/apache/predictionio" target="blank">GitHub</a></li><li><a href="mailto:user-subscr...@predictionio.apache.org" target="blank">Subscribe to User Mailing List</a></li><li><a href="//stackoverflow.com/questions/tagged/predictionio" target="blank">Stackoverflow</a></li></ul></div></div><div class="col-md-6 footer-link-column"><div class="footer-link-column-row"><h4>Contribute</h4><ul><li><a href="//predictionio.apache.org/community/contribute-code/" target="blank">Contri bute</a></li><li><a href="//github.com/apache/predictionio" target="blank">Source Code</a></li><li><a href="//issues.apache.org/jira/browse/PIO" target="blank">Bug Tracker</a></li><li><a href="mailto:dev-subscr...@predictionio.apache.org" target="blank">Subscribe to Development Mailing List</a></li></ul></div></div></div><div class="row"><div class="col-md-12 footer-link-column"><p>Apache PredictionIO, PredictionIO, Apache, the Apache feather logo, and the Apache PredictionIO project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p><p>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p></div></div></div><div id="footer-bottom"><div class="container"><div class="row"><div class="col-md-12"><div id="footer-logo-wrapper"><img alt="PredictionIO" src="/images/logos/logo-white-d1e9c6e6.png"/><span>®</span></div><div id="social-icons-wrapper"><a class="github-b utton" href="https://github.com/apache/predictionio" data-icon="octicon-star" data-show-count="true" aria-label="Star apache/predictionio on GitHub">Star</a> <a class="github-button" href="https://github.com/apache/predictionio/fork" data-icon="octicon-repo-forked" data-show-count="true" aria-label="Fork apache/predictionio on GitHub">Fork</a> <script id="github-bjs" async="" defer="" src="https://buttons.github.io/buttons.js"></script><a href="https://twitter.com/predictionio" target="blank"><img alt="PredictionIO on Twitter" src="/images/icons/twitter-ea9dc152.png"/></a> <a href="https://www.facebook.com/predictionio" target="blank"><img alt="PredictionIO on Facebook" src="/images/icons/facebook-5c57939c.png"/></a> </div></div></div></div></div></footer></div><script>(function(w,d,t,u,n,s,e){w['SwiftypeObject']=n;w[n]=w[n]||function(){ +(w[n].q=w[n].q||[]).push(arguments);};s=d.createElement(t); +e=d.getElementsByTagName(t)[0];s.async=1;s.src=u;e.parentNode.insertBefore(s,e); +})(window,document,'script','//s.swiftypecdn.com/install/v1/st.js','_st'); + +_st('install','HaUfpXXV87xoB_zzCQ45');</script><script src="/javascripts/application-d943a254.js"></script></body></html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/predictionio-site/blob/765e178c/batchpredict/index.html ---------------------------------------------------------------------- diff --git a/batchpredict/index.html b/batchpredict/index.html index c8e4073..fd62add 100644 --- a/batchpredict/index.html +++ b/batchpredict/index.html @@ -1,4 +1,4 @@ -<!DOCTYPE html><html><head><title>Batch Predictions</title><meta charset="utf-8"/><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta class="swiftype" name="title" data-type="string" content="Batch Predictions"/><link rel="canonical" href="https://predictionio.apache.org/batchpredict/"/><link href="/images/favicon/normal-b330020a.png" rel="shortcut icon"/><link href="/images/favicon/apple-c0febcf2.png" rel="apple-touch-icon"/><link href="//fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800" rel="stylesheet"/><link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"/><link href="/stylesheets/application-eccfc6cb.css" rel="stylesheet" type="text/css"/><script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script><script src="//cdn.mathjax.org/mathjax/latest/MathJax.js?co nfig=TeX-AMS-MML_HTMLorMML"></script><script src="//use.typekit.net/pqo0itb.js"></script><script>try{Typekit.load({ async: true });}catch(e){}</script></head><body><div id="global"><header><div class="container" id="header-wrapper"><div class="row"><div class="col-sm-12"><div id="logo-wrapper"><span id="drawer-toggle"></span><a href="#"></a><a href="http://predictionio.apache.org/"><img alt="Apache PredictionIO" id="logo" src="/images/logos/logo-ee2b9bb3.png"/></a><span>®</span></div><div id="menu-wrapper"><div id="pill-wrapper"><a class="pill left" href="/gallery/template-gallery">TEMPLATES</a> <a class="pill right" href="//github.com/apache/predictionio/">OPEN SOURCE</a></div></div><img class="mobile-search-bar-toggler hidden-md hidden-lg" src="/images/icons/search-glass-704bd4ff.png"/></div></div></div></header><div id="search-bar-row-wrapper"><div class="container-fluid" id="search-bar-row"><div class="row"><div class="col-md-9 col-sm-11 col-xs-11"><div class="hidden-md hidden- lg" id="mobile-page-heading-wrapper"><p>PredictionIO Docs</p><h4>Batch Predictions</h4></div><h4 class="hidden-sm hidden-xs">PredictionIO Docs</h4></div><div class="col-md-3 col-sm-1 col-xs-1 hidden-md hidden-lg"><img id="left-menu-indicator" src="/images/icons/down-arrow-dfe9f7fe.png"/></div><div class="col-md-3 col-sm-12 col-xs-12 swiftype-wrapper"><div class="swiftype"><form class="search-form"><img class="search-box-toggler hidden-xs hidden-sm" src="/images/icons/search-glass-704bd4ff.png"/><div class="search-box"><img src="/images/icons/search-glass-704bd4ff.png"/><input type="text" id="st-search-input" class="st-search-input" placeholder="Search Doc..."/></div><img class="swiftype-row-hider hidden-md hidden-lg" src="/images/icons/drawer-toggle-active-fcbef12a.png"/></form></div></div><div class="mobile-left-menu-toggler hidden-md hidden-lg"></div></div></div></div><div id="page" class="container-fluid"><div class="row"><div id="left-menu-wrapper" class="col-md-3"><nav id="nav- main"><ul><li class="level-1"><a class="expandible" href="/"><span>Apache PredictionIO® Documentation</span></a><ul><li class="level-2"><a class="final" href="/"><span>Welcome to Apache PredictionIO®</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Getting Started</span></a><ul><li class="level-2"><a class="final" href="/start/"><span>A Quick Intro</span></a></li><li class="level-2"><a class="final" href="/install/"><span>Installing Apache PredictionIO</span></a></li><li class="level-2"><a class="final" href="/start/download/"><span>Downloading an Engine Template</span></a></li><li class="level-2"><a class="final" href="/start/deploy/"><span>Deploying Your First Engine</span></a></li><li class="level-2"><a class="final" href="/start/customize/"><span>Customizing the Engine</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Integrating with Your App</span></a><ul><li class="level-2"><a class="final" href="/appintegrati on/"><span>App Integration Overview</span></a></li><li class="level-2"><a class="expandible" href="/sdk/"><span>List of SDKs</span></a><ul><li class="level-3"><a class="final" href="/sdk/java/"><span>Java & Android SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/php/"><span>PHP SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/python/"><span>Python SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/ruby/"><span>Ruby SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/community/"><span>Community Powered SDKs</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Deploying an Engine</span></a><ul><li class="level-2"><a class="final" href="/deploy/"><span>Deploying as a Web Service</span></a></li><li class="level-2"><a class="final active" href="/batchpredict/"><span>Batch Predictions</span></a></li><li class="level-2"><a class="final" href="/deploy/monitoring/"><span>Monitoring Engi ne</span></a></li><li class="level-2"><a class="final" href="/deploy/engineparams/"><span>Setting Engine Parameters</span></a></li><li class="level-2"><a class="final" href="/deploy/enginevariants/"><span>Deploying Multiple Engine Variants</span></a></li><li class="level-2"><a class="final" href="/deploy/plugin/"><span>Engine Server Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Customizing an Engine</span></a><ul><li class="level-2"><a class="final" href="/customize/"><span>Learning DASE</span></a></li><li class="level-2"><a class="final" href="/customize/dase/"><span>Implement DASE</span></a></li><li class="level-2"><a class="final" href="/customize/troubleshooting/"><span>Troubleshooting Engine Development</span></a></li><li class="level-2"><a class="final" href="/api/current/#package"><span>Engine Scala APIs</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Collecting and Analyzing Data</span></a><ul><li c lass="level-2"><a class="final" href="/datacollection/"><span>Event Server Overview</span></a></li><li class="level-2"><a class="final" href="/datacollection/eventapi/"><span>Collecting Data with REST/SDKs</span></a></li><li class="level-2"><a class="final" href="/datacollection/eventmodel/"><span>Events Modeling</span></a></li><li class="level-2"><a class="final" href="/datacollection/webhooks/"><span>Unifying Multichannel Data with Webhooks</span></a></li><li class="level-2"><a class="final" href="/datacollection/channel/"><span>Channel</span></a></li><li class="level-2"><a class="final" href="/datacollection/batchimport/"><span>Importing Data in Batch</span></a></li><li class="level-2"><a class="final" href="/datacollection/analytics/"><span>Using Analytics Tools</span></a></li><li class="level-2"><a class="final" href="/datacollection/plugin/"><span>Event Server Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Choosing an Algorithm</span>< /a><ul><li class="level-2"><a class="final" href="/algorithm/"><span>Built-in Algorithm Libraries</span></a></li><li class="level-2"><a class="final" href="/algorithm/switch/"><span>Switching to Another Algorithm</span></a></li><li class="level-2"><a class="final" href="/algorithm/multiple/"><span>Combining Multiple Algorithms</span></a></li><li class="level-2"><a class="final" href="/algorithm/custom/"><span>Adding Your Own Algorithms</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Tuning and Evaluation</span></a><ul><li class="level-2"><a class="final" href="/evaluation/"><span>Overview</span></a></li><li class="level-2"><a class="final" href="/evaluation/paramtuning/"><span>Hyperparameter Tuning</span></a></li><li class="level-2"><a class="final" href="/evaluation/evaluationdashboard/"><span>Evaluation Dashboard</span></a></li><li class="level-2"><a class="final" href="/evaluation/metricchoose/"><span>Choosing Evaluation Metrics</span></a></li><l i class="level-2"><a class="final" href="/evaluation/metricbuild/"><span>Building Evaluation Metrics</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>System Architecture</span></a><ul><li class="level-2"><a class="final" href="/system/"><span>Architecture Overview</span></a></li><li class="level-2"><a class="final" href="/system/anotherdatastore/"><span>Using Another Data Store</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>PredictionIO® Official Templates</span></a><ul><li class="level-2"><a class="final" href="/templates/"><span>Intro</span></a></li><li class="level-2"><a class="expandible" href="#"><span>Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/recommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/recommendatio n/evaluation/"><span>Evaluation Explained</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/reading-custom-events/"><span>Read Custom Events</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-data-prep/"><span>Customize Data Preparator</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-serving/"><span>Customize Serving</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/training-with-implicit-preference/"><span>Train with Implicit Preference</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/blacklist-items/"><span>Filter Recommended Items by Blacklist in Query</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/batch-evaluator/"><span>Batch Persistable Evaluator</s pan></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>E-Commerce Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/adjust-score/"><span>Adjust Score</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Similar Product</span></a><ul><li class="level-3"><a class="final" href="/templates/similarproduct/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href=" /templates/similarproduct/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/multi-events-multi-algos/"><span>Multiple Events and Multiple Algorithms</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/return-item-properties/"><span>Returns Item Properties</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/rid-user-set-event/"><span>Get Rid of Events for Users</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/recommended-user/"><span>Recommend Users</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Classification</span></a><ul><li class="level-3"><a class="final" hre f="/templates/classification/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/classification/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/classification/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/classification/add-algorithm/"><span>Use Alternative Algorithm</span></a></li><li class="level-3"><a class="final" href="/templates/classification/reading-custom-properties/"><span>Read Custom Properties</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Engine Template Gallery</span></a><ul><li class="level-2"><a class="final" href="/gallery/template-gallery/"><span>Browse</span></a></li><li class="level-2"><a class="final" href="/community/submit-template/"><span>Submit your Engine as a Template</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Demo Tutorials</span></a><ul><li c lass="level-2"><a class="final" href="/demo/tapster/"><span>Comics Recommendation Demo</span></a></li><li class="level-2"><a class="final" href="/demo/community/"><span>Community Contributed Demo</span></a></li><li class="level-2"><a class="final" href="/demo/textclassification/"><span>Text Classification Engine Tutorial</span></a></li></ul></li><li class="level-1"><a class="expandible" href="/community/"><span>Getting Involved</span></a><ul><li class="level-2"><a class="final" href="/community/contribute-code/"><span>Contribute Code</span></a></li><li class="level-2"><a class="final" href="/community/contribute-documentation/"><span>Contribute Documentation</span></a></li><li class="level-2"><a class="final" href="/community/contribute-sdk/"><span>Contribute a SDK</span></a></li><li class="level-2"><a class="final" href="/community/contribute-webhook/"><span>Contribute a Webhook</span></a></li><li class="level-2"><a class="final" href="/community/projects/"><span>Community Projects </span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Getting Help</span></a><ul><li class="level-2"><a class="final" href="/resources/faq/"><span>FAQs</span></a></li><li class="level-2"><a class="final" href="/support/"><span>Support</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Resources</span></a><ul><li class="level-2"><a class="final" href="/cli/"><span>Command-line Interface</span></a></li><li class="level-2"><a class="final" href="/resources/release/"><span>Release Cadence</span></a></li><li class="level-2"><a class="final" href="/resources/intellij/"><span>Developing Engines with IntelliJ IDEA</span></a></li><li class="level-2"><a class="final" href="/resources/upgrade/"><span>Upgrade Instructions</span></a></li><li class="level-2"><a class="final" href="/resources/glossary/"><span>Glossary</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Apache Software Foundation</span></a><ul ><li class="level-2"><a class="final" >href="https://www.apache.org/"><span>Apache Homepage</span></a></li><li >class="level-2"><a class="final" >href="https://www.apache.org/licenses/"><span>License</span></a></li><li >class="level-2"><a class="final" >href="https://www.apache.org/foundation/sponsorship.html"><span>Sponsorship</span></a></li><li > class="level-2"><a class="final" >href="https://www.apache.org/foundation/thanks.html"><span>Thanks</span></a></li><li > class="level-2"><a class="final" >href="https://www.apache.org/security/"><span>Security</span></a></li></ul></li></ul></nav></div><div > class="col-md-9 col-sm-12"><div class="content-header hidden-md >hidden-lg"><div id="breadcrumbs" class="hidden-sm hidden xs"><ul><li><a >href="#">Deploying an Engine</a><span >class="spacer">></span></li><li><span class="last">Batch >Predictions</span></li></ul></div><div id="page-title"><h1>Batch >Predictions</h1></div></div><div id="table-of-content-wrapper"><h5>On this >page</h5><aside id="table- of-contents"><ul> <li> <a href="#overview">Overview</a> </li> <li> <a href="#compatibility">Compatibility</a> </li> <li> <a href="#usage">Usage</a> </li> <li> <a href="#example">Example</a> </li> </ul> </aside><hr/><a id="edit-page-link" href="https://github.com/apache/predictionio/tree/livedoc/docs/manual/source/batchpredict/index.html.md"><img src="/images/icons/edit-pencil-d6c1bb3d.png"/>Edit this page</a></div><div class="content-header hidden-sm hidden-xs"><div id="breadcrumbs" class="hidden-sm hidden xs"><ul><li><a href="#">Deploying an Engine</a><span class="spacer">></span></li><li><span class="last">Batch Predictions</span></li></ul></div><div id="page-title"><h1>Batch Predictions</h1></div></div><div class="content"> <h2 id='overview' class='header-anchors'>Overview</h2><p>Process predictions for many queries using efficient parallelization through Spark. Useful for mass auditing of predictions and for generating predictions to push into other systems.</p><p>Batch predi ct reads and writes multi-object JSON files similar to the <a href="/datacollection/batchimport/">batch import</a> format. JSON objects are separated by newlines and cannot themselves contain unencoded newlines.</p><h2 id='compatibility' class='header-anchors'>Compatibility</h2><p><code>pio batchpredict</code> loads the engine and processes queries exactly like <code>pio deploy</code>. There is only one additional requirement for engines to utilize batch predict:</p><div class="alert-message warning"><p>All algorithm classes used in the engine must be <a href="https://www.scala-lang.org/api/2.11.8/index.html#scala.Serializable">serializable</a>. <strong>This is already true for PredictionIO's base algorithm classes</strong>, but may be broken by including non-serializable fields in their constructor. Using the <a href="http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/"><code>@transient</code> annotation</a> may help in these cases.</p></div><p>This requireme nt is due to processing the input queries as a <a href="https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds">Spark RDD</a> which enables high-performance parallelization, even on a single machine.</p><h2 id='usage' class='header-anchors'>Usage</h2><h3 id='<code>pio-batchpredict</code>' class='header-anchors' ><code>pio batchpredict</code></h3><p>Command to process bulk predictions. Takes the same options as <code>pio deploy</code> plus:</p><h3 id='<code>--input-<value></code>' class='header-anchors' ><code>--input <value></code></h3><p>Path to file containing queries; a multi-object JSON file with one query object per line. Accepts any valid Hadoop file URL.</p><p>Default: <code>batchpredict-input.json</code></p><h3 id='<code>--output-<value></code>' class='header-anchors' ><code>--output <value></code></h3><p>Path to file to receive results; a multi-object JSON file with one object per line, the prediction + or iginal query. Accepts any valid Hadoop file URL. Actual output will be written as Hadoop partition files in a directory with the output name.</p><p>Default: <code>batchpredict-output.json</code></p><h3 id='<code>--query-partitions-<value></code>' class='header-anchors' ><code>--query-partitions <value></code></h3><p>Configure the concurrency of predictions by setting the number of partitions used internally for the RDD of queries. This will directly effect the number of resulting <code>part-*</code> output files. While setting to <code>1</code> may seem appealing to get a single output file, this will remove parallelization for the batch process, reducing performance and possibly exhausting memory.</p><p>Default: number created by Spark context's <code>textFile</code> (probably the number of cores available on the local machine)</p><h3 id='<code>--engine-instance-id-<value></code>' class='header-anchors' ><code>--engine-instance-id <value></code></h3><p>I dentifier for the trained instance to use for batch predict.</p><p>Default: the latest trained instance.</p><h2 id='example' class='header-anchors'>Example</h2><h3 id='input' class='header-anchors'>Input</h3><p>A multi-object JSON file of queries as they would be sent to the engine's HTTP Queries API.</p><div class="alert-message note"><p>Read via <a href="https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets">SparkContext's <code>textFile</code></a> and so may be a single file or any supported Hadoop format.</p></div><p>File: <code>batchpredict-input.json</code></p><div class="highlight json"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 +<!DOCTYPE html><html><head><title>Batch Predictions</title><meta charset="utf-8"/><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta class="swiftype" name="title" data-type="string" content="Batch Predictions"/><link rel="canonical" href="https://predictionio.apache.org/batchpredict/"/><link href="/images/favicon/normal-b330020a.png" rel="shortcut icon"/><link href="/images/favicon/apple-c0febcf2.png" rel="apple-touch-icon"/><link href="//fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800" rel="stylesheet"/><link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"/><link href="/stylesheets/application-eccfc6cb.css" rel="stylesheet" type="text/css"/><script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.2/html5shiv.min.js"></script><script src="//cdn.mathjax.org/mathjax/latest/MathJax.js?co nfig=TeX-AMS-MML_HTMLorMML"></script><script src="//use.typekit.net/pqo0itb.js"></script><script>try{Typekit.load({ async: true });}catch(e){}</script></head><body><div id="global"><header><div class="container" id="header-wrapper"><div class="row"><div class="col-sm-12"><div id="logo-wrapper"><span id="drawer-toggle"></span><a href="#"></a><a href="http://predictionio.apache.org/"><img alt="Apache PredictionIO" id="logo" src="/images/logos/logo-ee2b9bb3.png"/></a><span>®</span></div><div id="menu-wrapper"><div id="pill-wrapper"><a class="pill left" href="/gallery/template-gallery">TEMPLATES</a> <a class="pill right" href="//github.com/apache/predictionio/">OPEN SOURCE</a></div></div><img class="mobile-search-bar-toggler hidden-md hidden-lg" src="/images/icons/search-glass-704bd4ff.png"/></div></div></div></header><div id="search-bar-row-wrapper"><div class="container-fluid" id="search-bar-row"><div class="row"><div class="col-md-9 col-sm-11 col-xs-11"><div class="hidden-md hidden- lg" id="mobile-page-heading-wrapper"><p>PredictionIO Docs</p><h4>Batch Predictions</h4></div><h4 class="hidden-sm hidden-xs">PredictionIO Docs</h4></div><div class="col-md-3 col-sm-1 col-xs-1 hidden-md hidden-lg"><img id="left-menu-indicator" src="/images/icons/down-arrow-dfe9f7fe.png"/></div><div class="col-md-3 col-sm-12 col-xs-12 swiftype-wrapper"><div class="swiftype"><form class="search-form"><img class="search-box-toggler hidden-xs hidden-sm" src="/images/icons/search-glass-704bd4ff.png"/><div class="search-box"><img src="/images/icons/search-glass-704bd4ff.png"/><input type="text" id="st-search-input" class="st-search-input" placeholder="Search Doc..."/></div><img class="swiftype-row-hider hidden-md hidden-lg" src="/images/icons/drawer-toggle-active-fcbef12a.png"/></form></div></div><div class="mobile-left-menu-toggler hidden-md hidden-lg"></div></div></div></div><div id="page" class="container-fluid"><div class="row"><div id="left-menu-wrapper" class="col-md-3"><nav id="nav- main"><ul><li class="level-1"><a class="expandible" href="/"><span>Apache PredictionIO® Documentation</span></a><ul><li class="level-2"><a class="final" href="/"><span>Welcome to Apache PredictionIO®</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Getting Started</span></a><ul><li class="level-2"><a class="final" href="/start/"><span>A Quick Intro</span></a></li><li class="level-2"><a class="final" href="/install/"><span>Installing Apache PredictionIO</span></a></li><li class="level-2"><a class="final" href="/start/download/"><span>Downloading an Engine Template</span></a></li><li class="level-2"><a class="final" href="/start/deploy/"><span>Deploying Your First Engine</span></a></li><li class="level-2"><a class="final" href="/start/customize/"><span>Customizing the Engine</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Integrating with Your App</span></a><ul><li class="level-2"><a class="final" href="/appintegrati on/"><span>App Integration Overview</span></a></li><li class="level-2"><a class="expandible" href="/sdk/"><span>List of SDKs</span></a><ul><li class="level-3"><a class="final" href="/sdk/java/"><span>Java & Android SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/php/"><span>PHP SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/python/"><span>Python SDK</span></a></li><li class="level-3"><a class="final" href="/sdk/ruby/"><span>Ruby SDK</span></a></li><li class="level-3"><a class="final" href="/community/projects/#sdks"><span>Community Powered SDKs</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Deploying an Engine</span></a><ul><li class="level-2"><a class="final" href="/deploy/"><span>Deploying as a Web Service</span></a></li><li class="level-2"><a class="final active" href="/batchpredict/"><span>Batch Predictions</span></a></li><li class="level-2"><a class="final" href="/deploy/monitoring/"><span>Monit oring Engine</span></a></li><li class="level-2"><a class="final" href="/deploy/engineparams/"><span>Setting Engine Parameters</span></a></li><li class="level-2"><a class="final" href="/deploy/enginevariants/"><span>Deploying Multiple Engine Variants</span></a></li><li class="level-2"><a class="final" href="/deploy/plugin/"><span>Engine Server Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Customizing an Engine</span></a><ul><li class="level-2"><a class="final" href="/customize/"><span>Learning DASE</span></a></li><li class="level-2"><a class="final" href="/customize/dase/"><span>Implement DASE</span></a></li><li class="level-2"><a class="final" href="/customize/troubleshooting/"><span>Troubleshooting Engine Development</span></a></li><li class="level-2"><a class="final" href="/api/current/#package"><span>Engine Scala APIs</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Collecting and Analyzing Data</span></a ><ul><li class="level-2"><a class="final" href="/datacollection/"><span>Event >Server Overview</span></a></li><li class="level-2"><a class="final" >href="/datacollection/eventapi/"><span>Collecting Data with >REST/SDKs</span></a></li><li class="level-2"><a class="final" >href="/datacollection/eventmodel/"><span>Events Modeling</span></a></li><li >class="level-2"><a class="final" >href="/datacollection/webhooks/"><span>Unifying Multichannel Data with >Webhooks</span></a></li><li class="level-2"><a class="final" >href="/datacollection/channel/"><span>Channel</span></a></li><li >class="level-2"><a class="final" >href="/datacollection/batchimport/"><span>Importing Data in >Batch</span></a></li><li class="level-2"><a class="final" >href="/datacollection/analytics/"><span>Using Analytics >Tools</span></a></li><li class="level-2"><a class="final" >href="/datacollection/plugin/"><span>Event Server >Plugin</span></a></li></ul></li><li class="level-1"><a class="expandible" >href="#"><span>Choosing an Algorit hm</span></a><ul><li class="level-2"><a class="final" href="/algorithm/"><span>Built-in Algorithm Libraries</span></a></li><li class="level-2"><a class="final" href="/algorithm/switch/"><span>Switching to Another Algorithm</span></a></li><li class="level-2"><a class="final" href="/algorithm/multiple/"><span>Combining Multiple Algorithms</span></a></li><li class="level-2"><a class="final" href="/algorithm/custom/"><span>Adding Your Own Algorithms</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Tuning and Evaluation</span></a><ul><li class="level-2"><a class="final" href="/evaluation/"><span>Overview</span></a></li><li class="level-2"><a class="final" href="/evaluation/paramtuning/"><span>Hyperparameter Tuning</span></a></li><li class="level-2"><a class="final" href="/evaluation/evaluationdashboard/"><span>Evaluation Dashboard</span></a></li><li class="level-2"><a class="final" href="/evaluation/metricchoose/"><span>Choosing Evaluation Metrics</span>< /a></li><li class="level-2"><a class="final" href="/evaluation/metricbuild/"><span>Building Evaluation Metrics</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>System Architecture</span></a><ul><li class="level-2"><a class="final" href="/system/"><span>Architecture Overview</span></a></li><li class="level-2"><a class="final" href="/system/anotherdatastore/"><span>Using Another Data Store</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>PredictionIO® Official Templates</span></a><ul><li class="level-2"><a class="final" href="/templates/"><span>Intro</span></a></li><li class="level-2"><a class="expandible" href="#"><span>Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/recommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/rec ommendation/evaluation/"><span>Evaluation Explained</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/reading-custom-events/"><span>Read Custom Events</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-data-prep/"><span>Customize Data Preparator</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/customize-serving/"><span>Customize Serving</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/training-with-implicit-preference/"><span>Train with Implicit Preference</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/blacklist-items/"><span>Filter Recommended Items by Blacklist in Query</span></a></li><li class="level-3"><a class="final" href="/templates/recommendation/batch-evaluator/"><span>Batch Persistable Ev aluator</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>E-Commerce Recommendation</span></a><ul><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/ecommercerecommendation/adjust-score/"><span>Adjust Score</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Similar Product</span></a><ul><li class="level-3"><a class="final" href="/templates/similarproduct/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="fin al" href="/templates/similarproduct/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/multi-events-multi-algos/"><span>Multiple Events and Multiple Algorithms</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/return-item-properties/"><span>Returns Item Properties</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/train-with-rate-event/"><span>Train with Rate Event</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/rid-user-set-event/"><span>Get Rid of Events for Users</span></a></li><li class="level-3"><a class="final" href="/templates/similarproduct/recommended-user/"><span>Recommend Users</span></a></li></ul></li><li class="level-2"><a class="expandible" href="#"><span>Classification</span></a><ul><li class="level-3"><a class=" final" href="/templates/classification/quickstart/"><span>Quick Start</span></a></li><li class="level-3"><a class="final" href="/templates/classification/dase/"><span>DASE</span></a></li><li class="level-3"><a class="final" href="/templates/classification/how-to/"><span>How-To</span></a></li><li class="level-3"><a class="final" href="/templates/classification/add-algorithm/"><span>Use Alternative Algorithm</span></a></li><li class="level-3"><a class="final" href="/templates/classification/reading-custom-properties/"><span>Read Custom Properties</span></a></li></ul></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Engine Template Gallery</span></a><ul><li class="level-2"><a class="final" href="/gallery/template-gallery/"><span>Browse</span></a></li><li class="level-2"><a class="final" href="/community/submit-template/"><span>Submit your Engine as a Template</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Demo Tutorials</span></a ><ul><li class="level-2"><a class="final" >href="/community/projects/#demos"><span>Community Contributed >Demo</span></a></li><li class="level-2"><a class="final" >href="/demo/textclassification/"><span>Text Classification Engine >Tutorial</span></a></li></ul></li><li class="level-1"><a class="expandible" >href="/community/"><span>Getting Involved</span></a><ul><li >class="level-2"><a class="final" >href="/community/contribute-code/"><span>Contribute Code</span></a></li><li >class="level-2"><a class="final" >href="/community/contribute-documentation/"><span>Contribute >Documentation</span></a></li><li class="level-2"><a class="final" >href="/community/contribute-sdk/"><span>Contribute a SDK</span></a></li><li >class="level-2"><a class="final" >href="/community/contribute-webhook/"><span>Contribute a >Webhook</span></a></li><li class="level-2"><a class="final" >href="/community/projects/"><span>Community >Projects</span></a></li></ul></li><li class="level-1"><a class="expandible" >href="#"><span>Gett ing Help</span></a><ul><li class="level-2"><a class="final" href="/resources/faq/"><span>FAQs</span></a></li><li class="level-2"><a class="final" href="/support/"><span>Support</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Resources</span></a><ul><li class="level-2"><a class="final" href="/cli/"><span>Command-line Interface</span></a></li><li class="level-2"><a class="final" href="/resources/release/"><span>Release Cadence</span></a></li><li class="level-2"><a class="final" href="/resources/intellij/"><span>Developing Engines with IntelliJ IDEA</span></a></li><li class="level-2"><a class="final" href="/resources/upgrade/"><span>Upgrade Instructions</span></a></li><li class="level-2"><a class="final" href="/resources/glossary/"><span>Glossary</span></a></li></ul></li><li class="level-1"><a class="expandible" href="#"><span>Apache Software Foundation</span></a><ul><li class="level-2"><a class="final" href="https://www.apache.org/"><span>Apache Homep age</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/licenses/"><span>License</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/foundation/sponsorship.html"><span>Sponsorship</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/foundation/thanks.html"><span>Thanks</span></a></li><li class="level-2"><a class="final" href="https://www.apache.org/security/"><span>Security</span></a></li></ul></li></ul></nav></div><div class="col-md-9 col-sm-12"><div class="content-header hidden-md hidden-lg"><div id="breadcrumbs" class="hidden-sm hidden xs"><ul><li><a href="#">Deploying an Engine</a><span class="spacer">></span></li><li><span class="last">Batch Predictions</span></li></ul></div><div id="page-title"><h1>Batch Predictions</h1></div></div><div id="table-of-content-wrapper"><h5>On this page</h5><aside id="table-of-contents"><ul> <li> <a href="#overview">Overview</a> </li> <li> <a href="#compatibil ity">Compatibility</a> </li> <li> <a href="#usage">Usage</a> </li> <li> <a href="#example">Example</a> </li> </ul> </aside><hr/><a id="edit-page-link" href="https://github.com/apache/predictionio/tree/livedoc/docs/manual/source/batchpredict/index.html.md"><img src="/images/icons/edit-pencil-d6c1bb3d.png"/>Edit this page</a></div><div class="content-header hidden-sm hidden-xs"><div id="breadcrumbs" class="hidden-sm hidden xs"><ul><li><a href="#">Deploying an Engine</a><span class="spacer">></span></li><li><span class="last">Batch Predictions</span></li></ul></div><div id="page-title"><h1>Batch Predictions</h1></div></div><div class="content"> <h2 id='overview' class='header-anchors'>Overview</h2><p>Process predictions for many queries using efficient parallelization through Spark. Useful for mass auditing of predictions and for generating predictions to push into other systems.</p><p>Batch predict reads and writes multi-object JSON files similar to the <a href="/datacollection/bat chimport/">batch import</a> format. JSON objects are separated by newlines and cannot themselves contain unencoded newlines.</p><h2 id='compatibility' class='header-anchors'>Compatibility</h2><p><code>pio batchpredict</code> loads the engine and processes queries exactly like <code>pio deploy</code>. There is only one additional requirement for engines to utilize batch predict:</p><div class="alert-message warning"><p>All algorithm classes used in the engine must be <a href="https://www.scala-lang.org/api/2.11.8/index.html#scala.Serializable">serializable</a>. <strong>This is already true for PredictionIO's base algorithm classes</strong>, but may be broken by including non-serializable fields in their constructor. Using the <a href="http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/"><code>@transient</code> annotation</a> may help in these cases.</p></div><p>This requirement is due to processing the input queries as a <a href="https://spark.apache.org/docs/l atest/rdd-programming-guide.html#resilient-distributed-datasets-rdds">Spark RDD</a> which enables high-performance parallelization, even on a single machine.</p><h2 id='usage' class='header-anchors'>Usage</h2><h3 id='<code>pio-batchpredict</code>' class='header-anchors' ><code>pio batchpredict</code></h3><p>Command to process bulk predictions. Takes the same options as <code>pio deploy</code> plus:</p><h3 id='<code>--input-<value></code>' class='header-anchors' ><code>--input <value></code></h3><p>Path to file containing queries; a multi-object JSON file with one query object per line. Accepts any valid Hadoop file URL.</p><p>Default: <code>batchpredict-input.json</code></p><h3 id='<code>--output-<value></code>' class='header-anchors' ><code>--output <value></code></h3><p>Path to file to receive results; a multi-object JSON file with one object per line, the prediction + original query. Accepts any valid Hadoop file URL. Actual output will be written as Hadoo p partition files in a directory with the output name.</p><p>Default: <code>batchpredict-output.json</code></p><h3 id='<code>--query-partitions-<value></code>' class='header-anchors' ><code>--query-partitions <value></code></h3><p>Configure the concurrency of predictions by setting the number of partitions used internally for the RDD of queries. This will directly effect the number of resulting <code>part-*</code> output files. While setting to <code>1</code> may seem appealing to get a single output file, this will remove parallelization for the batch process, reducing performance and possibly exhausting memory.</p><p>Default: number created by Spark context's <code>textFile</code> (probably the number of cores available on the local machine)</p><h3 id='<code>--engine-instance-id-<value></code>' class='header-anchors' ><code>--engine-instance-id <value></code></h3><p>Identifier for the trained instance to use for batch predict.</p><p>Default: the latest trained instance.</p><h2 id='example' class='header-anchors'>Example</h2><h3 id='input' class='header-anchors'>Input</h3><p>A multi-object JSON file of queries as they would be sent to the engine's HTTP Queries API.</p><div class="alert-message note"><p>Read via <a href="https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets">SparkContext's <code>textFile</code></a> and so may be a single file or any supported Hadoop format.</p></div><p>File: <code>batchpredict-input.json</code></p><div class="highlight json"><table style="border-spacing: 0"><tbody><tr><td class="gutter gl" style="text-align: right"><pre class="lineno">1 2 3 4