[ 
https://issues.apache.org/jira/browse/AMBARI-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ananda Verma updated AMBARI-18622:
----------------------------------
    Description: 
It makes sense to integrate PredictionIO with Ambari since it is now part of 
apache group and also heavily depends on current amabri/hdp stack.  

Feature includes adding support for apache predictionIO cluster provisioning 
via Ambari.

In general, pio can be defined as a service in HDP which has following 
components - 
1) Event Server  - stores events (data)
2) Engine - Engine is responsible for making prediction. It contains one or 
more machine learning algorithms. An engine reads training data and build 
predictive model(s). It is then deployed as a web service. A deployed engine 
responds to prediction queries from your application through REST API in 
real-time.

PredictionIO also has external dependencies on following  - 

1. HBase: Event Server uses Apache HBase as the data store. It stores imported 
events. If you are not using the PredictionIO Event Server, you do not need to 
install HBase.

2. Apache Spark: Spark is a large-scale data processing engine that powers the 
algorithm, training, and serving processing.

3. HDFS: The output of training has two parts: a model and its meta-data. The 
model is then stored in HDFS or a local file system.

4. Elasticsearch: It stores metadata such as model versions, engine versions, 
access key and app id mappings, evaluation results, etc.


  was:
Feature includes adding support for apache predictionIO cluster provisioning 
via Ambari.
In general, pio can be defined as a service in HDP which has following 
components - 
1) Event Server  - stores events (data)
2) Engine - Engine is responsible for making prediction. It contains one or 
more machine learning algorithms. An engine reads training data and build 
predictive model(s). It is then deployed as a web service. A deployed engine 
responds to prediction queries from your application through REST API in 
real-time.

PredictionIO also has external dependencies on following  - 

1. HBase: Event Server uses Apache HBase as the data store. It stores imported 
events. If you are not using the PredictionIO Event Server, you do not need to 
install HBase.

2. Apache Spark: Spark is a large-scale data processing engine that powers the 
algorithm, training, and serving processing.

3. HDFS: The output of training has two parts: a model and its meta-data. The 
model is then stored in HDFS or a local file system.

4. Elasticsearch: It stores metadata such as model versions, engine versions, 
access key and app id mappings, evaluation results, etc.



> Integrate PredictionIO (Machine Learning Engine) With Ambari
> ------------------------------------------------------------
>
>                 Key: AMBARI-18622
>                 URL: https://issues.apache.org/jira/browse/AMBARI-18622
>             Project: Ambari
>          Issue Type: New Feature
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Ananda Verma
>
> It makes sense to integrate PredictionIO with Ambari since it is now part of 
> apache group and also heavily depends on current amabri/hdp stack.  
> Feature includes adding support for apache predictionIO cluster provisioning 
> via Ambari.
> In general, pio can be defined as a service in HDP which has following 
> components - 
> 1) Event Server  - stores events (data)
> 2) Engine - Engine is responsible for making prediction. It contains one or 
> more machine learning algorithms. An engine reads training data and build 
> predictive model(s). It is then deployed as a web service. A deployed engine 
> responds to prediction queries from your application through REST API in 
> real-time.
> PredictionIO also has external dependencies on following  - 
> 1. HBase: Event Server uses Apache HBase as the data store. It stores 
> imported events. If you are not using the PredictionIO Event Server, you do 
> not need to install HBase.
> 2. Apache Spark: Spark is a large-scale data processing engine that powers 
> the algorithm, training, and serving processing.
> 3. HDFS: The output of training has two parts: a model and its meta-data. The 
> model is then stored in HDFS or a local file system.
> 4. Elasticsearch: It stores metadata such as model versions, engine versions, 
> access key and app id mappings, evaluation results, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to