[ 
https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224914#comment-16224914
 ] 

ASF GitHub Bot commented on TINKERPOP-1801:
-------------------------------------------

Github user artem-aliev commented on the issue:

    https://github.com/apache/tinkerpop/pull/734
  
    I have fixed test failures.
        TinkerPopComputer does not call ComputerPorgram.execute methods if spit 
has no vertices.
        For example: modern graph has 6 vertices but computer has 8 cores, 
there will be two empty splits.
        TraversalVertexProgram use execute step to setup next profiling step, 
so it is not setup side effects properly for empty splits.
        So tests did not filed in docker but failed on computer with more then 
6 cores.
        The fix add check that profile side effects were regester properly 
before using


>  OLAP profile() step return incorrect timing
> --------------------------------------------
>
>                 Key: TINKERPOP-1801
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1801
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.3.0, 3.2.6
>            Reporter: Artem Aliev
>
> Graph ProfileStep calculates time of next()/hasNext() calls, expecting 
> recursion.
> But Message passing/RDD joins is used by GraphComputer.
> So next() does not recursively call next steps, but message is generated. And 
> most of the time is taken by message passing (RDD join). 
> Thus on graph computer the time between ProfileStep should be measured, not 
> inside it.
> The other approach is to get Spark statistics with SparkListener and add 
> spark stages timings into profiler metrics. that will work only for spark but 
> will give better representation of step costs.
> The simple fix is measuring time between OLAP iterations and add it to the 
> profiler step.
> This will not take into account computer setup time, but will be precise 
> enough for long running queries.
> To reproduce:
> tinkerPop 3.2.6 gremlin:
> {code}
> plugin activated: tinkerpop.server
> plugin activated: tinkerpop.utilities
> plugin activated: tinkerpop.spark
> plugin activated: tinkerpop.tinkergraph
> gremlin> graph = 
> GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
> gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
> ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], 
> sparkgraphcomputer]
> gremlin> g.V().out().out().count().profile()
> ==>Traversal Metrics
> Step                                                               Count  
> Traversers       Time (ms)    % Dur
> =============================================================================================================
> GraphStep(vertex,[])                                                 808      
>    808           2.025    18.35
> VertexStep(OUT,vertex)                                              8049      
>    562           4.430    40.14
> VertexStep(OUT,edge)                                              327370      
>   7551           4.581    41.50
> CountGlobalStep                                                        1      
>      1           0.001     0.01
>                                             >TOTAL                     -      
>      -          11.038        -
> gremlin> clock(1){g.V().out().out().count().next() }
> ==>3421.92758
> gremlin>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to