Hi Chirag, Could you please provide more information on your Java server environment?
Regards, Donald ᐧ On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani <chirag.lakh...@gmail.com> wrote: > Thanks for letting me know about this, it looks pretty interesting. From > reading the documentation it seems that the server must be built on a Spark > cluster, is that correct? Is it possible to deploy it in on a Java > server? That is how we are currently running our web app. > > > > On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan <simonc...@gmail.com> wrote: > >> The latest version of PredictionIO, which is now under Apache 2 license, >> supports the deployment of MLlib models on production. >> >> The "engine" you build will including a few components, such as: >> - Data - includes Data Source and Data Preparator >> - Algorithm(s) >> - Serving >> I believe that you can do the feature vector creation inside the Data >> Preparator component. >> >> Currently, the package comes with two templates: 1) Collaborative >> Filtering Engine Template - with MLlib ALS; 2) Classification Engine >> Template - with MLlib Naive Bayes. The latter one may be useful to you. And >> you can customize the Algorithm component, too. >> >> I have just created a doc: http://docs.prediction.io/0.8.1/templates/ >> Love to hear your feedback! >> >> Regards, >> Simon >> >> >> >> On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani < >> chirag.lakh...@gmail.com> wrote: >> >>> Would pipelining include model export? I didn't see that in the >>> documentation. >>> >>> Are there ways that this is being done currently? >>> >>> >>> >>> On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <men...@gmail.com> >>> wrote: >>> >>>> We are working on the pipeline features, which would make this >>>> procedure much easier in MLlib. This is still a WIP and the main JIRA >>>> is at: >>>> >>>> https://issues.apache.org/jira/browse/SPARK-1856 >>>> >>>> Best, >>>> Xiangrui >>>> >>>> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani >>>> <chirag.lakh...@gmail.com> wrote: >>>> > Hello, >>>> > >>>> > I have been prototyping a text classification model that my company >>>> would >>>> > like to eventually put into production. Our technology stack is >>>> currently >>>> > Java based but we would like to be able to build our models in >>>> Spark/MLlib >>>> > and then export something like a PMML file which can be used for model >>>> > scoring in real-time. >>>> > >>>> > I have been using scikit learn where I am able to take the training >>>> data >>>> > convert the text data into a sparse data format and then take the >>>> other >>>> > features and use the dictionary vectorizer to do one-hot encoding for >>>> the >>>> > other categorical variables. All of those things seem to be possible >>>> in >>>> > mllib but I am still puzzled about how that can be packaged in such a >>>> way >>>> > that the incoming data can be first made into feature vectors and then >>>> > evaluated as well. >>>> > >>>> > Are there any best practices for this type of thing in Spark? I hope >>>> this >>>> > is clear but if there are any confusions then please let me know. >>>> > >>>> > Thanks, >>>> > >>>> > Chirag >>>> >>> >>> >> > -- Donald Szeto PredictionIO