[ https://issues.apache.org/jira/browse/PIO-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088312#comment-16088312 ]
ASF GitHub Bot commented on PIO-105: ------------------------------------ GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/412 Batch Predictions JIRA issue [PIO-105](https://issues.apache.org/jira/browse/PIO-105) Provides a new `pio batchpredict` command. Reads from multi-object JSON input file. Example: ```json {"user":"1"} {"user":"2"} {"user":"3"} {"user":"4"} {"user":"5"} ``` Writes to multi-object JSON output file (actually Hadoop partition files). Example: ```json {"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}} {"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}} {"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}} {"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}} {"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}} ``` See the included [console usage help](#diff-2cf174557564e09d52157be8e839fecf) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio batch-predict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #412 ---- commit 99ee6493bddc8f02aee384f3a2db27c6ae3f68cc Author: Mars Hall <m...@heroku.com> Date: 2017-07-13T00:12:25Z Implement BatchPredict commit c205357498e4a4a745810b04130c5bbad78f8686 Author: Mars Hall <m...@heroku.com> Date: 2017-07-14T22:29:26Z Improve console help for batch predict. commit 93f7ed3e5ed10155a688a032e367793d75fa116a Author: Mars Hall <m...@heroku.com> Date: 2017-07-14T22:46:30Z Undo experimental change to publish tools artifact ---- > Batch Predictions > ----------------- > > Key: PIO-105 > URL: https://issues.apache.org/jira/browse/PIO-105 > Project: PredictionIO > Issue Type: New Feature > Components: Core > Reporter: Mars Hall > Assignee: Mars Hall > > Implement a new {{pio batchpredict}} command to enable massive, fast, batch > predictions from a trained model. Read a multi-object JSON file as the input > format, with one query object per line. Similarly, write results to a > multi-object JSON file, with one prediction result + its original query per > line. > Currently getting bulk predictions from PredictionIO is possible with either: > * a {{pio eval}} script, which will always train a fresh, unvalidated model > before getting predictions > * a custom script that hits the {{queries.json}} HTTP API, which is a serious > bottleneck when requesting hundreds-of-thousands or millions of predictions > Neither of these existing bulk-prediction hacks are adequate for the reasons > mentioned. > It's time for this use-case to be a firstclass command :D -- This message was sent by Atlassian JIRA (v6.4.14#64029)