[
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606315#comment-16606315
]
Frank McQuillan commented on MADLIB-1171:
-----------------------------------------
Attached is an updated approach to model versioning.
[^model-versioning-work2.pdf]
The changes from the previous version are:
1) model summary in JSON format
2) model in serialized format
3) can mix different model types in the same summary/repo tables
Why JSON?
* enables #3
* makes backward compatibility easier
* better for portability (e.g., to a low latency prediction server running out
of db)
* enables easier integration with 3rd party model management tools
> Support model versioning in output tables
> -----------------------------------------
>
> Key: MADLIB-1171
> URL: https://issues.apache.org/jira/browse/MADLIB-1171
> Project: Apache MADlib
> Issue Type: New Feature
> Components: All Modules
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v2.0
>
> Attachments: model-versioning-work1.pdf, model-versioning-work2.pdf,
> p100.png, p101.png
>
>
> Context
> For many MADlib modules, <out_table> contains the separate models for each
> group and <out_table>_summary contains the common model data for all groups.
> Modeling versioning can be awkward since the model output table and model
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop
> tables, so that I can easily get a history of the model runs with clear
> versioning.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)