[ https://issues.apache.org/jira/browse/KAFKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488985#comment-13488985 ]
Jay Kreps commented on KAFKA-545: --------------------------------- Okay, wrote some tests for lockin MappedByteBuffer and FileChannel to see how writes block reads. I think these things work the same in both cases. Again the results are a series of max times, this time over 5M writes. For both cases I see behavior similar to the below--high max times while a flush is occuring but no hard locking. I am not sure the exact cause of this, but it is safe to say that mmap is no panacea here. [jkreps@jkreps-ld kafka-jbod]$ java -server -Xmx128M -Xms128M -XX:+UseConcMarkSweepGC -cp project/boot/scala-2.8.0/lib/scala-library.jar:core/target/scala_2.8.0/test-classes kafka.TestMmapLocking $((2*1024*1024*1024-1)) 5000 5000000 1 flushing flush completed in 49.601333 ms 10.077477 2.214208 1.712157 1.895483 1.798951 1.934366 1.738388 flushing 13.097689 262.515081 flush completed in 1752.944685 ms 2.044426 1.655329 2.063751 1.55256 2.429741 1.717703 1.477672 9.815928 flushing 368.168449 flush completed in 1928.963959 ms 240.966127 1.600222 1.191583 1.750381 2.09028 1.694696 1.88224 2.122531 flushing 168.749728 385.088614 flush completed in 2160.281323 ms 241.846812 1.745029 1.82718 1.756801 1.822239 1.689906 1.708812 1.633651 flushing 183.764496 368.315579 flush completed in 2180.277532 ms 276.418226 1.737839 1.730913 1.711507 1.540686 2.011486 1.937501 1.834844 flushing 95.983117 129.890815 flush completed in 2695.239899 ms 1288.338521 1.828685 1.613301 1.63822 1.725626 > Add a Performance Suite for the Log subsystem > --------------------------------------------- > > Key: KAFKA-545 > URL: https://issues.apache.org/jira/browse/KAFKA-545 > Project: Kafka > Issue Type: New Feature > Affects Versions: 0.8 > Reporter: Jay Kreps > Priority: Blocker > Labels: features > Attachments: KAFKA-545-draft.patch > > > We have had several performance concerns or potential improvements for the > logging subsystem. To conduct these in a data-driven way, it would be good to > have a single-machine performance test that isolated the performance of the > log. > The performance optimizations we would like to evaluate include > - Special casing appends in a follower which already have the correct offset > to avoid decompression and recompression > - Memory mapping either all or some of the segment files to improve the > performance of small appends and lookups > - Supporting multiple data directories and avoiding RAID > Having a standalone tool is nice to isolate the component and makes profiling > more intelligible. > This test would drive load against Log/LogManager controlled by a set of > command line options. These command line program could then be scripted up > into a suite of tests that covered variations in message size, message set > size, compression, number of partitions, etc. > Here is a proposed usage for the tool: > ./bin/kafka-log-perf-test.sh > Option Description > ------ ----------- > --partitions The number of partitions to write to > --dir The directory in which to write the log > --message-size The size of the messages > --set-size The number of messages per write > --compression Compression alg > --messages The number of messages to write > --readers The number of reader threads reading the data > The tool would capture latency and throughput for the append() and read() > operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira