[ 
https://issues.apache.org/jira/browse/SAMZA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617117#comment-15617117
 ] 

Xinyu Liu commented on SAMZA-1043:
----------------------------------

RB: https://reviews.apache.org/r/53282/.

> Samza performance improvements
> ------------------------------
>
>                 Key: SAMZA-1043
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1043
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Xinyu Liu
>            Assignee: Xinyu Liu
>             Fix For: 0.12.0
>
>
> In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
> the results are subpar to map/reduce and spark. By looking at the metrics 
> closely, we found two basic problems:
> 1) Not enough data to process. This is spotted as the unprocessed message 
> queue length was zero for quite a lot of times.
> 2) Not process fast enough. We found samza performed closely in both median 
> size records (100B) and small record (10B), while spark can scale very well 
> in the small record (over 1M/s).
> The first problem is solved by increasing the buffer size. This ticket is to 
> address the second problem, which contains three major improvements:
> - Option to turn off timer metrics calculation: one of the main time spent in 
> samza processing turns out to be just keeping the timer metrics. While it is 
> useful in debugging, it becomes a bottleneck when running a stable job with 
> high performance. In my testing job which consumes 8M mock data, it took 30 
> secs with timer metrics on. After turning it off, it only took 14 secs.
> - Java coding improvements: The AsyncRunLoop code can be further optimized 
> for efficiency. Some of the thread-safe data structure I am using is not for 
> optimal performance (Collections.synchronizedSet). I switched to use 
> CopyOnWriteArraySet, which has far better performance due to more reads and 
> small set size.
> - In-order processing path improvements: AsyncRunLoop handles the callbacks 
> regardless of whether it's in-order or out-of-order (max concurrency > 1), 
> which incurs quite some cost. By simplying the logic for in-order handling, 
> the performance gains.
> After all three improvements, my test job with mock input (8M messages) can 
> be processed within 8 sec, so it's 1M/s for one cpu core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to