[jira] [Commented] (STORM-1742) More accurate 'complete latency'

ASF GitHub Bot (JIRA) Mon, 20 Jun 2016 03:27:39 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339301#comment-15339301
 ]


ASF GitHub Bot commented on STORM-1742:
---------------------------------------

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1379
  
    I'm just done with performance tests with 10 nodes which uses 7 nodes for 
workers.
    (Performance tests are being done with VMs so it can be affected to 
environment. So I ran tests two times per option.)
    
    Environment for each VM: 2 cores, 16G memory, RHEL7 64bit, Oracle JDK 
1.8.0_60
    
    I did performance tests via yahoo/storm-perf-test (SOL) which we used for 
performance test before ThroughputVsLatency. I don't have experience with 
ThroughputVsLatency with multiple nodes so I was not sure how to tune so just 
picked SOL.
    
    At first, I just used 1 worker for each VM, and made all tasks distributed 
to all workers so that each workers have one task and one acker.
    
    Test command line is here: `storm jar 
storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar 
com.yahoo.storm.perftest.Main --ack --name test -l 1 -n 1 --workers 7 --spout 3 
--bolt 4 --testTimeSec 900 -c topology.max.spout.pending=1092 --messageSize 10 
-c topology.acker.executors=null`
    
    Test result is here: 
https://gist.github.com/HeartSaVioR/69078c3abb56561111288708d7dd6fab
    After warming up, patched version performs more stable, and faster.
    
    I was curious that how performance is changing if we take more pressures to 
ackers. So I just made 4x tasks and ran test again.
    
    Test command line is here: `storm jar 
storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar 
com.yahoo.storm.perftest.Main --ack --name test -l 1 -n 1 --workers 7 --spout 
12 --bolt 16 --testTimeSec 900 -c topology.max.spout.pending=1092 --messageSize 
10 -c topology.acker.executors=null`
    
    Test result is here: 
https://gist.github.com/HeartSaVioR/9db168a2550abbf0d8f114269ec3aaa3
    Similar results are observed.
    
    We expected no performance affection or even degradation but actually it 
improves the performance with SOL.
    I guess this result comes from moving System.currentTimeMillis() from Spout 
to Acker. It was called once for every 20 completed tuples "in Spout loop 
thread" which is blocking. Even Acker is calling System.currentTimeMillis() to 
every completed tuples and having heavier payload, it affects less negative to 
performance.
    
    @ptgoetz Could you check my test result and confirm that makes sense?


> More accurate 'complete latency'
> --------------------------------
>
>                 Key: STORM-1742
>                 URL: https://issues.apache.org/jira/browse/STORM-1742
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>
> I already initiated talking thread on dev@ list. Below is copy of the content 
> in my mail.
> http://mail-archives.apache.org/mod_mbox/storm-dev/201604.mbox/%3CCAF5108gn=rskundfs7-sgy_pd-_prgj2hf2t5e5zppp-knd...@mail.gmail.com%3E
> While thinking about metrics improvements, I doubt how many users know that
> what 'exactly' is complete latency. In fact, it's somewhat complicated
> because additional waiting time could be added to complete latency because
> of single-thread model event loop of spout.
> Long running nextTuple() / ack() / fail() can affect complete latency but
> it's behind the scene. No latency information provided, and someone even
> didn't know about this characteristic. Moreover, calling nextTuple() could
> be skipped due to max spout waiting, which will make us harder to guess
> when avg. latency of nextTuple() will be provided.
> I think separation of threads (tuple handler to separate thread, as JStorm
> provides) would resolve the gap, but it requires our spout logic to be
> thread-safe, so I'd like to find workaround first.
> My sketched idea is let Acker decides end time for root tuple.
> There're two subsequent ways to decide start time for root tuple,
> 1. when Spout about to emit ACK_INIT to Acker (in other words, keep it as
> it is)
>   - Acker sends ack / fail message to Spout with timestamp, and Spout
> calculates time delta
>   - pros. : It's most accurate way since it respects the definition of
> 'complete latency'.
>   - cons. : The sync of machine time between machines are very important.
> Sub-millisecond of precision would be required.
> 2. when Acker receives ACK_INIT from Spout
>   - Acker calculates time delta itself, and sends ack / fail message to
> Spout with time delta
>   - pros. : No requirement to sync the time between servers so strictly.
>   - cons. : It doesn't contain the latency to send / receive ACK_INIT
> between Spout and Acker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1742) More accurate 'complete latency'

Reply via email to