[ 
https://issues.apache.org/jira/browse/S2GRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670979#comment-15670979
 ] 

ASF GitHub Bot commented on S2GRAPH-123:
----------------------------------------

Github user daewon commented on a diff in the pull request:

    https://github.com/apache/incubator-s2graph/pull/98#discussion_r88280168
  
    --- Diff: 
s2core/src/main/scala/org/apache/s2graph/core/mysqls/LabelIndex.scala ---
    @@ -39,7 +40,31 @@ object LabelIndex extends Model[LabelIndex] {
           rs.string("meta_seqs").split(",").filter(_ != "").map(s => 
s.toByte).toList match {
             case metaSeqsList => metaSeqsList
           },
    -      rs.string("formulars"))
    +      rs.string("formulars"),
    +      rs.intOpt("dir"),
    +      rs.stringOpt("options")
    +    )
    +  }
    +  object WriteOption {
    +    val Default = WriteOption()
    +  }
    +  case class WriteOption(method: String = "default",
    +                         rate: Double = 1.0,
    +                         totalModular: Long = 100,
    +                         storeDegree: Boolean = true) {
    +
    +    def sample[T](a: T, hashOpt: Option[Long]): Boolean = {
    +      if (method == "drop") false
    +      else if (method == "sample") {
    +        if (scala.util.Random.nextDouble() < rate) true
    +        else false
    +      } else if (method == "hash_sample") {
    +        //        logger.error(s"[XXX]: ${a.toString} ${hash} 
${totalModular}, ${rate}, ${(hash.abs % totalModular) / totalModular.toDouble}")
    --- End diff --
    
    The comment remaining for debugging  should be removed.


> Support different index on out/in direction.
> --------------------------------------------
>
>                 Key: S2GRAPH-123
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-123
>             Project: S2Graph
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>             Fix For: 0.2.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some situation, user might want to set different behavior based on 
> `direction` of edge.
> Based on my experience on deploying and operating S2Graph with user's news 
> article click activity, It is extremely common that few of article get most 
> of clicks. 
> More formal way to describe problem, let's say we have `user_article_click` 
> label and each edge consist of `user_id` and `article_id` as source/target 
> vertex.
> In this case, 'out' direction edge spread out evenly because we are 
> prepending murmur hash at the beginning of row key. we have very few edges 
> per each source vertex(`user_id`) since each individual can't click million 
> articles.
> However 'in' direction, which hold all edges connecting all `user_id` for 
> each `article_id` have different scenario. only few `article_id` get lots of 
> click from million users and this quickly become the `super node`. This yield 
> excessive region server resource usage and It is not reasonable million edges 
> on one single source vertex anyway because it would be timeout to send 
> million edges to client.
> Currently, there is no way to control how to process edge per each direction, 
> but above case can be avoided if we can provide options.
> I suggest new feature to provide separate index with write options for each 
> `direction`.
> Possible write options can be followings(based on our write transaction 
> steps).
> # `IndexEdge`: dropAll/sampling/storeAll(default)
> # `SnapshotEdge`: drop/store(default)
> # `Degree`: ignore/update(default)
> By enabling/disabling each element in write transaction, users can decide 
> what to do when they know how their data will be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to