[ 
https://issues.apache.org/jira/browse/S2GRAPH-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOYUNG YOON reassigned S2GRAPH-206:
-----------------------------------

    Assignee: DOYUNG YOON

> Generalize machine learning model serving.
> ------------------------------------------
>
>                 Key: S2GRAPH-206
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-206
>             Project: S2Graph
>          Issue Type: New Feature
>          Components: s2core
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> One of the top use cases of OLTP graph database is the 
> recommendation(arguably).
> Let's see how item-based collaborative filtering(item-based CF) can be served 
> as graph query.
>  # fetch user's history as the edges of clicked items.
>  # fetch each item's similar items.
> There are few problems with above naive approach since we need to insert many 
> item pairs as edges(N^2 where N is the total number of items).
> Even though bulk load can update a large number of edges in a stable manner, 
> the user needs to generate similarity matrix, which is often very large.
> Also above approach does not generalize other model-based approaches.
> For example, the user wants to use matrix factorization, need to work on 
> following steps.
>  # dump user's history in raw records.
>  # convert user history to the matrix by creating dictionary map between raw 
> value and sequence.
>  # factorize user history, usually using Alternating least squares (ALS) 
> which yields factorized model U, I.
>  # run k nearest neighbor per each item on I, which yield an array of item 
> sequence per each item sequence.
>  # convert item sequence - an array of similar item sequence back to an item 
> - array of the similar item by using dictionary created from 2.
>  # bulk load item-item similarity as edges.
> Note that these steps become tedious.
> I think above steps can be changed into following if S2Graph support the more 
> generalized way to support serving machine learning model.
> 1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be 
> automated.
> To automate 4,5,6, we need to provide ways to load ML models from the remote 
> location, and integrate pre-loaded ML model into graph query structure.
> So logically, the original query should be changed into following.
>  # fetch user's history as the edge of clicked items.
>  # convert clicked items into item sequences.
>  # run the k-nearest-neighbor search on pre-loaded ML model and get an array 
> of similar item sequence.
>  # convert an array of similar item sequence into an array of the similar 
> item using pre-loaded ML model's dictionary.
>  
> One might argue that supporting machine learning serving is not S2Graph's 
> focus.
> The reason behind this suggestion is that I believe providing a unified 
> interface to traverse not only pre-stored data as vertex/edge, but also model 
> generated data on the fly as vertex/edge can be very useful (not only for 
> collaborative filtering use cases).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to