[ 
https://issues.apache.org/jira/browse/TINKERPOP-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153578#comment-17153578
 ] 

ASF GitHub Bot commented on TINKERPOP-2376:
-------------------------------------------

spmallette opened a new pull request #1301:
URL: https://github.com/apache/tinkerpop/pull/1301


   https://issues.apache.org/jira/browse/TINKERPOP-2376
   
   This change came externally from a file attached to the JIRA by the person 
who first posted it. I've just turned it into a PR since it has sat a while. In 
some basic tests the change did seem to improve the sampling as described in 
the ticket.
   
   All tests pass with `docker/build.sh -t -n -i`
   
   VOTE +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Probability distribution controlled by weight when using sample step
> --------------------------------------------------------------------
>
>                 Key: TINKERPOP-2376
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2376
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.4.6
>         Environment: Gremlin-Tinkerpop 3.4.6 on Fedora 32
>            Reporter: zjxian
>            Priority: Critical
>         Attachments: SampleGlobalStep.java, out.csv
>
>
> create a simple graph with 1 central node and 3 surronding nodes
> add 3 edges with equal weight (1) and form a stargraph
> traverse from center ( v[0] ) to other (3) nodes, sample(1) and record the 
> destination node
> do that 10000 times
> estimated probabitlity distribution: 
> v[1]:v[2]:v[3] = 3333:3333:3333 (1:1:1)
> what i got: 
> v[1]:v[2]:v[3] = 3320:4439:2241
> I've checked some source file, like 
> ([https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/SampleGlobalStep.java]).
>   The probability distribution shoud be like 1/3:4/9:2/9, which is very close 
> to the results I got.
> I think some improvements is needed here to make "random walk" in tinkerpop 
> really useful.
> the script i use:
> {code:java}
> //代码占位符
> conf = new BaseConfiguration()
> conf.setProperty("gremlin.tinkergraph.vertexIdManager","LONG")
> conf.setProperty("gremlin.tinkergraph.edgeIdManager","LONG")
> conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","LONG");
> graph = TinkerGraph.open(conf)g=graph.traversal()
> for(i=0;i<=3;i++){    
>   g.addV().iterate()
> }
> for(i=1;i<=3;i++){
>  g.V(0).addE("connect").property("weight",1).to(g.V(i)).iterate()
> }
> ["bash", "-c", "rm -f out.csv"].execute().waitFor()file=new 
> File("out.csv")file.append("id\r\n")
> for(i=0;i<10000;i++){
>  g.V(0).outE().sample(1).by("weight").otherV().map{file.append 
> it.get().id()+"\r\n"}.iterate()
> }
> {code}
> see result in attached out.csv
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to