[
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195229#comment-13195229
]
Yuki Morishita commented on CASSANDRA-3623:
-------------------------------------------
Vijay, Pavel,
I did the test similar to Pavel's on physical machine (4core 2.6GHz Xeon/16GB
RAM/Linux(debian)) with trunk + 3623(v3) + 3610(v3).
Cassandra is run on following jvm.
{code}
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
{code}
with jvm args:
{code}
-ea
-javaagent:bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms6G -Xmx6G -Xmn2G -Xss128k
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true
{code}
Populate enough data with stress tool, set crc_check_chance to 0.0, flush and
compact.
Befor each test run, clean page cache. Stress tool is run from another machine.
* data_access_mode: mmap
{code}
$ tools/stress/bin/stress -n 500000 -S 1024 -I SnappyCompressor -o read -d node0
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
27487,2748,2748,0.01813206242951213,10
65226,3773,3773,0.013355361827287422,20
103145,3791,3791,0.01334416372171528,30
141092,3794,3794,0.013307842310530199,40
178981,3788,3788,0.013323840692549289,50
217062,3808,3808,0.013260129723484152,60
255020,3795,3795,0.01330330892038569,70
293075,3805,3805,0.013265825778478518,80
331046,3797,3797,0.013295910036606884,91
369059,3801,3801,0.01328353458027517,101
i407030,3797,3797,0.01329540965473651,111
444920,3789,3789,0.013323251517550806,121
482894,3797,3797,0.013299231052825617,131
500000,1710,1710,0.010978779375657664,136
END
{code}
* data_access_mode: standard
{code}
$ tools/stress/bin/stress -n 500000 -S 1024 -I SnappyCompressor -o read -d node0
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
25474,2547,2547,0.019527989322446416,10
117046,9157,9157,0.005506617743415018,20
211863,9481,9481,0.005313298248204436,30
306773,9491,9491,0.005311305447265831,40
401107,9433,9433,0.005341160133143935,50
496051,9494,9494,0.005200739383215369,60
500000,394,394,0.0019680931881488986,61
END
{code}
I ran the above several times (making sure each test is isolated), for each
iteration I observe about the same result.
Things I noticed when digging with VisualVM
- Snappy uncompression with direct bytebuffers seems slightly faster, but its
impact to overall read performace is negligible.
- I observed that CompressedMappedFileDataInput.reBuffer is called many times
especially from the path CMFDI.reset -> CMFDI.seek -> CMFDI.reBuffer.
- When using CMFDI, I observe higher cpu usage than CRAR over all.
Right now I cannot find the reason to use mmapped bytebuffers for compressed
files.
> use MMapedBuffer in CompressedSegmentedFile.getSegment
> ------------------------------------------------------
>
> Key: CASSANDRA-3623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.1
> Reporter: Vijay
> Assignee: Vijay
> Labels: compression
> Fix For: 1.1
>
> Attachments: 0001-MMaped-Compression-segmented-file-v2.patch,
> 0001-MMaped-Compression-segmented-file-v3.patch,
> 0001-MMaped-Compression-segmented-file.patch,
> 0002-tests-for-MMaped-Compression-segmented-file-v2.patch,
> 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx,
> MMappedIO-Performance.docx
>
>
> CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to
> use the MMap and hence a higher CPU on the nodes and higher latencies on
> reads.
> This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
> // TODO refactor this to separate concept of "buffer to avoid lots of read()
> syscalls" and "compression buffer"
> but i think a separate class for the Buffer will be better.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira