DOYUNG YOON created S2GRAPH-206:

             Summary: Generalize machine learning model serving.
                 Key: S2GRAPH-206
             Project: S2Graph
          Issue Type: New Feature
          Components: s2core
            Reporter: DOYUNG YOON

One of the top use cases of OLTP graph database is the recommendation(arguably).

Let's see how item-based collaborative filtering(item-based CF) can be served 
as graph query.
 # fetch user's history as the edges of clicked items.
 # fetch each item's similar items.

There are few problems with above naive approach since we need to insert many 
item pairs as edges(N^2 where N is the total number of items).

Even though bulk load can update a large number of edges in a stable manner, 
the user needs to generate similarity matrix, which is often very large.

Also above approach does not generalize other model-based approaches.

For example, the user wants to use matrix factorization, need to work on 
following steps.
 # dump user's history in raw records.
 # convert user history to the matrix by creating dictionary map between raw 
value and sequence.
 # factorize user history, usually using Alternating least squares (ALS) which 
yields factorized model U, I.
 # run k nearest neighbor per each item on I, which yield an array of item 
sequence per each item sequence.
 # convert item sequence - an array of similar item sequence back to an item - 
array of the similar item by using dictionary created from 2.
 # bulk load item-item similarity as edges.

Note that these steps become tedious.

I think above steps can be changed into following if S2Graph support the more 
generalized way to support serving machine learning model.

1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be 

To automate 4,5,6, we need to provide ways to load ML models from the remote 
location, and integrate pre-loaded ML model into graph query structure.

So logically, the original query should be changed into following.
 # fetch user's history as the edge of clicked items.
 # convert clicked items into item sequences.
 # run the k-nearest-neighbor search on pre-loaded ML model and get an array of 
similar item sequence.
 # convert an array of similar item sequence into an array of the similar item 
using pre-loaded ML model's dictionary.


One might argue that supporting machine learning serving is not S2Graph's focus.

The reason behind this suggestion is that I believe providing a unified 
interface to traverse not only pre-stored data as vertex/edge, but also model 
generated data on the fly as vertex/edge can be very useful (not only for 
collaborative filtering use cases).


This message was sent by Atlassian JIRA

Reply via email to