Chan, Donald, and I met about this discussion and wanted to share the idea with 
everyone in hopes it seems reasonable. Please comment.

Our goal:
1) a stateless CLI (with one exception)
2) run from anywhere, in any order
3) replace `pio build` with `sbt build` + a new command, something like `pio 
register` that stores metadata—all of the rest of the rest of PIO commands read 
this metadata.

This can and will be done incrementally:
1) Chan’s PIO-51 being the first step but it will not allow commands to run 
from anywhere it will be renamed and make some steps towards the goal
2) remove `pio build` replace with `sbt build` and `pio register`

#1 and many already merged PRs and changes including optional Elasticsearch 5.x 
support will go into Apache PredictionIO-0.11.0 in the next few weeks.

#2 may have to wait for PredicitonIO-0.11.0+

The basis for stateless workflow will be the new concept of an 
engine-instance-id, created at `pio register` time or set optionally by the 
user. This will be consumed by all other commands to reference the metadata 
from any location including other machines connected to PIO resources, making 
them (except for `pio register`) stateless.



Thread was Re: [jira] [Commented] (PIO-51) Enable `pio build/train/deploy` 
outside of engine directory


On Jan 22, 2017, at 11:54 AM, Pat Ferrel (JIRA) <[email protected]> wrote:


   [ 
https://issues.apache.org/jira/browse/PIO-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833658#comment-15833658
 ] 

Pat Ferrel commented on PIO-51:
-------------------------------

I understand that in PIO there a plethora of ids that confuse and hide the real 
issue, try to ignore those and think about what we want not so much what we 
have. We need to simplify this to remove as many of these confusing ids as 
possible and as quickly as possible so we don't draw out or add to the 
confusion. You and I have a hard time explaining what purpose those ids have 
(imagine the user's confusion), I choose to ignore them for this very reason.

Regardless of future work, the path-to-engine-directory should not be used to 
identify an engine instance however indirectly. This is because the exact same 
code jar may be used for multiple engine instances and there may be multiple 
engine.json files for each of those engines in the same directory. If you are 
implying a 1-1 correspondence between dir and engine instances we have a 
problem. These other intermediate ids are completely useless, in principal, and 
must be removed in order to make the CLI stateless. Remember the next step is 
replace `pio build` with `sbt build`, which will require a workflow change to 
create `pio register`.

If this means wait on "run from anywhere" until `sbt build` and `pio register` 
are implemented so be it. Doing "run from everywhere" without asking how it 
affect stateless workflow and CLI is asking for trouble.



> Enable `pio build/train/deploy` outside of engine directory
> -----------------------------------------------------------
> 
>                Key: PIO-51
>                URL: https://issues.apache.org/jira/browse/PIO-51
>            Project: PredictionIO
>         Issue Type: Improvement
>           Reporter: Chan
> 
> Users can now provide the engine directory path as —engine-dir or -ed, and 
> call `pio build/train/deploy` from anywhere.
> The “engineVersion” used to identify a prediction engine is created using the 
> hash of the engine directory path. As a result, the filepath of the engine 
> had to be kept the same in a distributed setup, with multiple machines using 
> the same trained model. This was a point of confusion for some users, which 
> led to this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to