[
https://issues.apache.org/jira/browse/S2GRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670281#comment-15670281
]
DOYUNG YOON commented on S2GRAPH-123:
-------------------------------------
Here is my first attempt to implement this.
Note that now user can provide advanced options to control what to store
actually per ({{LabelIndex}}, {{Direction}}) pair.
{noformat}
{
"label": "movie_user_rate",
"srcServiceName": "movie",
"srcColumnName": "user_id",
"srcColumnType": "string",
"tgtServiceName": "movie",
"tgtColumnName": "movie_id",
"tgtColumnType": "long",
"indices": [
{
"name": "_PK",
"propNames": [
"_timestamp"
],
"direction": "out" // [both/in/out, default both],
"options": {
"method": "hash_sample" // [drop, sample, hash_sample],
"totalModular": 100,
"rate": 0.1,
"degree": true
}
}
],
"props": [
{
"name": "rating",
"defaultValue": 0,
"dataType": "integer"
}
],
"serviceName": "movie",
"consistencyLevel": "strong",
"hTableName": "s2graph-alpha",
"isDirected": "true",
"options": {
}
}
{noformat}
> Support different index on out/in direction.
> --------------------------------------------
>
> Key: S2GRAPH-123
> URL: https://issues.apache.org/jira/browse/S2GRAPH-123
> Project: S2Graph
> Issue Type: New Feature
> Affects Versions: 0.2.0
> Reporter: DOYUNG YOON
> Assignee: DOYUNG YOON
> Fix For: 0.2.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> In some situation, user might want to set different behavior based on
> `direction` of edge.
> Based on my experience on deploying and operating S2Graph with user's news
> article click activity, It is extremely common that few of article get most
> of clicks.
> More formal way to describe problem, let's say we have `user_article_click`
> label and each edge consist of `user_id` and `article_id` as source/target
> vertex.
> In this case, 'out' direction edge spread out evenly because we are
> prepending murmur hash at the beginning of row key. we have very few edges
> per each source vertex(`user_id`) since each individual can't click million
> articles.
> However 'in' direction, which hold all edges connecting all `user_id` for
> each `article_id` have different scenario. only few `article_id` get lots of
> click from million users and this quickly become the `super node`. This yield
> excessive region server resource usage and It is not reasonable million edges
> on one single source vertex anyway because it would be timeout to send
> million edges to client.
> Currently, there is no way to control how to process edge per each direction,
> but above case can be avoided if we can provide options.
> I suggest new feature to provide separate index with write options for each
> `direction`.
> Possible write options can be followings(based on our write transaction
> steps).
> # `IndexEdge`: dropAll/sampling/storeAll(default)
> # `SnapshotEdge`: drop/store(default)
> # `Degree`: ignore/update(default)
> By enabling/disabling each element in write transaction, users can decide
> what to do when they know how their data will be.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)