Re: Stateless Builds

Pat Ferrel Wed, 07 Dec 2016 17:44:07 -0800

My first question is how to train on an ephemeral machine to swap models into 
an already deployed prediction server, because this is what i do all the time. 
The only way to do this now is train first on dummy data then deploy and 
re-train as data comes in, but there are other issues and questions below. Some 
may be slightly off topic of this specific PR.

> On Dec 5, 2016, at 10:00 AM, Donald Szeto <[email protected]> wrote:
> 
> Hi all,
> 
> I am moving the discussion of stateless build (
> https://github.com/apache/incubator-predictionio/pull/328) here. Replying
> Pat:
> 
>> BTW @chanlee514 @dszeto Are we thinking of a new command, something like
> pio register that would add metadata to the metastore? This would need to
> be run every time the engine.json changed for instance? It would also do
> not compile? Is there an alternative? What state does this leave us in?
> 
> I imagine we would need pio register after this. Something like what docker
> push would do for you today. Changes of engine.json will not require
> registration because it is consumed during runtime by pio train and pio
> deploy. We are phasing out pio build so that engine templates will be more
> friendly with different IDEs.

I’m all for removing the manifest and stateless build but I’m not sure we mean 
the same thing by stateless. My issue is more with stateless commands, or put 
differently as a fully flexible workflow. Which means all commands read 
metadata from the metastore, and only one, very explicitly sets metadata into 
the metastore. Doing the write in train doesn't consider the the deploy before 
train and multi-tenancy use case.

Deploy then train:
1) pio eventserver to start ES on any machine
2) pio deploy to get the query server (prediction server) on any machine
3) pio train at any time on any machine and have a mechanism for deployed 
engines to discover the metadata they need when they need it or have it 
automatically updated when changed (pick a method push for deployed engines and 
pull for train)
4) send input an any time

Multi-tenancy:
This seems to imply a user visible id for an engine instance id in today’s 
nomenclature. For multi-tenancy, the user is going to want to set this instance 
id somewhere and should have stateless commands, not only stateless build.

> 
>> After the push, what action create binary (I assume pio build) what
> action adds metadata to the metastore (I assume pio train) So does this
> require they run on the same machine? They often do not.
> pio build will still create the binary at this point (and hopefully phased
> out as mentioned). Right now the only metadata that is disappearing are
> engine manifests. Engine instances will still be written after pio train,
> and used by pio deploy.
> 
>> One more question. After push how do we run the PredictionServer or train
> on multiple machines? In the past this required copying the manifest.json
> and making sure binaries are in the same location on all machines.
> "In the same location" is actually a downside IMO of the manifest.json
> design. Without manifest.json now, you would need to run pio commands from
> a location with a built engine, because instead of looking at engine
> manifests, it will now look locally for engine JARs. So deployment would
> still involve copying engine JARs to a remote deployment machine, running
> pio commands at the engine template location with engine-id and
> engine-version arguments.

I guess I also don't understand the need for engine-id and engine-version. 
Let’s do away with them. There is one metadata object that points to input data 
id, params, model id, and binary. This id can be assigned by the user.

With the above in place we are ready to imagine an EventServer where you POST 
to pio-ip/dataset/resource-id (no keys) and GET from pio-ip/model/resource-id 
to do queries. This would allow multi-tenancy and merge the EventServer and 
PredictionServer under the well understood banner of REST. Extending this a 
little further we have all the commands implemented as REST APIs. The CLI 
becomes some simple scripts or binaries that hit the REST interface and an 
admin server that hits the same interface.

This is compatible with the simple stateless build as a first step as long as 
we don’t perpetuate hidden instance ids and stateful commands like a train that 
creates the hidden id. But maybe I misunderstand the code or plans for next 
steps?

> 
> Regards,
> Donald
>

Re: Stateless Builds

Reply via email to