DOYUNG YOON reassigned S2GRAPH-206:
Assignee: DOYUNG YOON
> Generalize machine learning model serving.
> Key: S2GRAPH-206
> URL: https://issues.apache.org/jira/browse/S2GRAPH-206
> Project: S2Graph
> Issue Type: New Feature
> Components: s2core
> Reporter: DOYUNG YOON
> Assignee: DOYUNG YOON
> Priority: Major
> Original Estimate: 672h
> Remaining Estimate: 672h
> One of the top use cases of OLTP graph database is the
> Let's see how item-based collaborative filtering(item-based CF) can be served
> as graph query.
> # fetch user's history as the edges of clicked items.
> # fetch each item's similar items.
> There are few problems with above naive approach since we need to insert many
> item pairs as edges(N^2 where N is the total number of items).
> Even though bulk load can update a large number of edges in a stable manner,
> the user needs to generate similarity matrix, which is often very large.
> Also above approach does not generalize other model-based approaches.
> For example, the user wants to use matrix factorization, need to work on
> following steps.
> # dump user's history in raw records.
> # convert user history to the matrix by creating dictionary map between raw
> value and sequence.
> # factorize user history, usually using Alternating least squares (ALS)
> which yields factorized model U, I.
> # run k nearest neighbor per each item on I, which yield an array of item
> sequence per each item sequence.
> # convert item sequence - an array of similar item sequence back to an item
> - array of the similar item by using dictionary created from 2.
> # bulk load item-item similarity as edges.
> Note that these steps become tedious.
> I think above steps can be changed into following if S2Graph support the more
> generalized way to support serving machine learning model.
> 1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be
> To automate 4,5,6, we need to provide ways to load ML models from the remote
> location, and integrate pre-loaded ML model into graph query structure.
> So logically, the original query should be changed into following.
> # fetch user's history as the edge of clicked items.
> # convert clicked items into item sequences.
> # run the k-nearest-neighbor search on pre-loaded ML model and get an array
> of similar item sequence.
> # convert an array of similar item sequence into an array of the similar
> item using pre-loaded ML model's dictionary.
> One might argue that supporting machine learning serving is not S2Graph's
> The reason behind this suggestion is that I believe providing a unified
> interface to traverse not only pre-stored data as vertex/edge, but also model
> generated data on the fly as vertex/edge can be very useful (not only for
> collaborative filtering use cases).
This message was sent by Atlassian JIRA