[ 
https://issues.apache.org/jira/browse/S2GRAPH-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOYUNG YOON updated S2GRAPH-206:
--------------------------------
    Description: 
One of the top use cases of OLTP graph database is the recommendation(arguably).

Let's see how item-based collaborative filtering(item-based CF) can be served 
as graph query.
 # fetch user's history as the edges of clicked items.
 # fetch each item's similar items.

There are few problems with above naive approach since we need to insert many 
item pairs as edges(N^2 where N is the total number of items).

Even though bulk load can update a large number of edges in a stable manner, 
the user needs to generate similarity matrix, which is often very large.

Also above approach does not generalize other model-based approaches.

For example, the user wants to use matrix factorization, need to work on 
following steps.
 # dump user's history in raw records.
 # convert user history to the matrix by creating dictionary map between raw 
value and sequence.
 # factorize user history, usually using Alternating least squares (ALS) which 
yields factorized model U, I.
 # run k nearest neighbor per each item on I, which yield an array of item 
sequence per each item sequence.
 # convert item sequence  an array of similar item sequence back to an item  
array of the similar item by using dictionary created from 2.
 # bulk load item-item similarity as edges.

Note that these steps become tedious.

I think above steps can be changed into following if S2Graph support the more 
generalized way to support serving machine learning model.

1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be 
automated.

To automate 4,5,6, we need to provide ways to load ML models from the remote 
location, and integrate pre-loaded ML model into graph query structure.

So logically, the original query should be changed into following.
 # fetch user's history as the edge of clicked items.
 # convert clicked items into item sequences.
 # run the k-nearest-neighbor search on pre-loaded ML model and get an array of 
similar item sequence.
 # convert an array of similar item sequence into an array of the similar item 
using pre-loaded ML model's dictionary.

 

One might argue that supporting machine learning serving is not S2Graph's focus.

The reason behind this suggestion is that I believe providing a unified 
interface to traverse not only pre-stored data as vertex/edge, but also model 
generated data on the fly as vertex/edge can be very useful (not only for 
collaborative filtering use cases).

 

  was:
One of the top use cases of OLTP graph database is the recommendation(arguably).

Let's see how item-based collaborative filtering(item-based CF) can be served 
as graph query.
 # fetch user's history as the edges of clicked items.
 # fetch each item's similar items.

There are few problems with above naive approach since we need to insert many 
item pairs as edges(N^2 where N is the total number of items).

Even though bulk load can update a large number of edges in a stable manner, 
the user needs to generate similarity matrix, which is often very large.

Also above approach does not generalize other model-based approaches.

For example, the user wants to use matrix factorization, need to work on 
following steps.
 # dump user's history in raw records.
 # convert user history to the matrix by creating dictionary map between raw 
value and sequence.
 # factorize user history, usually using Alternating least squares (ALS) which 
yields factorized model U, I.
 # run k nearest neighbor per each item on I, which yield an array of item 
sequence per each item sequence.
 # convert item sequence - an array of similar item sequence back to an item - 
array of the similar item by using dictionary created from 2.
 # bulk load item-item similarity as edges.

Note that these steps become tedious.

I think above steps can be changed into following if S2Graph support the more 
generalized way to support serving machine learning model.

1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be 
automated.

To automate 4,5,6, we need to provide ways to load ML models from the remote 
location, and integrate pre-loaded ML model into graph query structure.

So logically, the original query should be changed into following.
 # fetch user's history as the edge of clicked items.
 # convert clicked items into item sequences.
 # run the k-nearest-neighbor search on pre-loaded ML model and get an array of 
similar item sequence.
 # convert an array of similar item sequence into an array of the similar item 
using pre-loaded ML model's dictionary.

 

One might argue that supporting machine learning serving is not S2Graph's focus.

The reason behind this suggestion is that I believe providing a unified 
interface to traverse not only pre-stored data as vertex/edge, but also model 
generated data on the fly as vertex/edge can be very useful (not only for 
collaborative filtering use cases).

 


> Generalize machine learning model serving.
> ------------------------------------------
>
>                 Key: S2GRAPH-206
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-206
>             Project: S2Graph
>          Issue Type: New Feature
>          Components: s2core
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> One of the top use cases of OLTP graph database is the 
> recommendation(arguably).
> Let's see how item-based collaborative filtering(item-based CF) can be served 
> as graph query.
>  # fetch user's history as the edges of clicked items.
>  # fetch each item's similar items.
> There are few problems with above naive approach since we need to insert many 
> item pairs as edges(N^2 where N is the total number of items).
> Even though bulk load can update a large number of edges in a stable manner, 
> the user needs to generate similarity matrix, which is often very large.
> Also above approach does not generalize other model-based approaches.
> For example, the user wants to use matrix factorization, need to work on 
> following steps.
>  # dump user's history in raw records.
>  # convert user history to the matrix by creating dictionary map between raw 
> value and sequence.
>  # factorize user history, usually using Alternating least squares (ALS) 
> which yields factorized model U, I.
>  # run k nearest neighbor per each item on I, which yield an array of item 
> sequence per each item sequence.
>  # convert item sequence  an array of similar item sequence back to an item  
> array of the similar item by using dictionary created from 2.
>  # bulk load item-item similarity as edges.
> Note that these steps become tedious.
> I think above steps can be changed into following if S2Graph support the more 
> generalized way to support serving machine learning model.
> 1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be 
> automated.
> To automate 4,5,6, we need to provide ways to load ML models from the remote 
> location, and integrate pre-loaded ML model into graph query structure.
> So logically, the original query should be changed into following.
>  # fetch user's history as the edge of clicked items.
>  # convert clicked items into item sequences.
>  # run the k-nearest-neighbor search on pre-loaded ML model and get an array 
> of similar item sequence.
>  # convert an array of similar item sequence into an array of the similar 
> item using pre-loaded ML model's dictionary.
>  
> One might argue that supporting machine learning serving is not S2Graph's 
> focus.
> The reason behind this suggestion is that I believe providing a unified 
> interface to traverse not only pre-stored data as vertex/edge, but also model 
> generated data on the fly as vertex/edge can be very useful (not only for 
> collaborative filtering use cases).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to