[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing

ASF GitHub Bot (JIRA) Fri, 20 Oct 2017 08:11:13 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212766#comment-16212766
 ]


ASF GitHub Bot commented on TINKERPOP-1801:
-------------------------------------------

Github user artem-aliev commented on the issue:

    https://github.com/apache/tinkerpop/pull/734
  
    The fix add iteration time to the appropriate step. Iteration time is a 
time between workerIterationStart() and workerIterationEnd() callbacks.
    So before fix the timing looks like 
    ```
    gremlin> g.V().out().out().count().profile()
    ==>Traversal Metrics
    Step                                                               Count  
Traversers       Time (ms)    % Dur
    
=============================================================================================================
    GraphStep(vertex,[])                                                9962    
    9962          70.873    48.95
    VertexStep(OUT,vertex)                                           1012657    
    3745          37.132    25.65
    VertexStep(OUT,edge)                                             2101815    
    6192          36.751    25.39
    CountGlobalStep                                                        1    
       1           0.018     0.01
                                                >TOTAL                     -    
       -         144.775        -
    ```
    While query runs 10s seconds.
    After the fix:
    ```
    gremlin> g.V().out().out().count().profile()
    ==>Traversal Metrics
    Step                                                               Count  
Traversers       Time (ms)    % Dur
    
=============================================================================================================
    GraphStep(vertex,[])                                                9962    
    9962       14186.809    97.43
    VertexStep(OUT,vertex)                                           1012657    
    3745         340.051     2.34
    VertexStep(OUT,edge)                                             2101815    
    6192          33.684     0.23
    CountGlobalStep                                                        1    
       1           0.004     0.00
                                                >TOTAL                     -    
       -       14560.549        -
    ```
    That shows that most of the time for this OLAP query was spend in the 
initial iteration (actually star graph creation).
    There still could be some inconsistencies because
    1. No computer specific setup time is measured
    2. Spark has a lot of lazy staff, so most of Spark RDD setup are counted as 
a first step.
    3. The algorithm could fail to assign timing to the right step in bas of 
sophisticated queries.
    
    By the way new timing is pretty close to the wall clock time. 



>  OLAP profile() step return incorrect timing
> --------------------------------------------
>
>                 Key: TINKERPOP-1801
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1801
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.3.0, 3.2.6
>            Reporter: Artem Aliev
>
> Graph ProfileStep calculates time of next()/hasNext() calls, expecting 
> recursion.
> But Message passing/RDD joins is used by GraphComputer.
> So next() does not recursively call next steps, but message is generated. And 
> most of the time is taken by message passing (RDD join). 
> Thus on graph computer the time between ProfileStep should be measured, not 
> inside it.
> The other approach is to get Spark statistics with SparkListener and add 
> spark stages timings into profiler metrics. that will work only for spark but 
> will give better representation of step costs.
> The simple fix is measuring time between OLAP iterations and add it to the 
> profiler step.
> This will not take into account computer setup time, but will be precise 
> enough for long running queries.
> To reproduce:
> tinkerPop 3.2.6 gremlin:
> {code}
> plugin activated: tinkerpop.server
> plugin activated: tinkerpop.utilities
> plugin activated: tinkerpop.spark
> plugin activated: tinkerpop.tinkergraph
> gremlin> graph = 
> GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
> gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
> ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], 
> sparkgraphcomputer]
> gremlin> g.V().out().out().count().profile()
> ==>Traversal Metrics
> Step                                                               Count  
> Traversers       Time (ms)    % Dur
> =============================================================================================================
> GraphStep(vertex,[])                                                 808      
>    808           2.025    18.35
> VertexStep(OUT,vertex)                                              8049      
>    562           4.430    40.14
> VertexStep(OUT,edge)                                              327370      
>   7551           4.581    41.50
> CountGlobalStep                                                        1      
>      1           0.001     0.01
>                                             >TOTAL                     -      
>      -          11.038        -
> gremlin> clock(1){g.V().out().out().count().next() }
> ==>3421.92758
> gremlin>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TINKERPOP-1801) OLAP profile() step return incorrect timing

Reply via email to