[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311690#comment-14311690
 ] 

Jay Kreps commented on KAFKA-1933:
----------------------------------

This is very interesting but also kind of scary. My observation is that this 
kind of sophisticated locking mixed in with regular code tends to work at first 
but inevitably gets subtly broken over time as people make changes who don't 
understand the magic.

A couple of thoughts:
1. Using the semaphore array to synchronize is correct but confusing since this 
is a common error pattern
2. What is the impact on the non-compressed case?
3. There is a ton of low-hanging fruit in the compression code itself. I 
suspect just optimizing that could yield a comparable 2x improvement and that 
would pay off both on the clients and on the server.
4. There has been some discussion of changing the message format to avoid the 
need for recompressing messages. That is it might be possible to leave the 
compressed messages with offsets 0, 1, 2, etc and have the interpretation be 
relative to some base offset. We would still need to decompress to validate but 
this would avoid the recompression entirely.

> Fine-grained locking in log append
> ----------------------------------
>
>                 Key: KAFKA-1933
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1933
>             Project: Kafka
>          Issue Type: Improvement
>          Components: log
>            Reporter: Maxim Ivanov
>            Assignee: Jay Kreps
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1933.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 3.9 sec
> Gzip: 62.3 sec -> 24.8 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to