[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

Yuki Morishita (Commented) (JIRA) Fri, 27 Jan 2012 15:28:35 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195229#comment-13195229
 ]


Yuki Morishita commented on CASSANDRA-3623:
-------------------------------------------

Vijay, Pavel,

I did the test similar to Pavel's on physical machine (4core 2.6GHz Xeon/16GB 
RAM/Linux(debian)) with trunk + 3623(v3) + 3610(v3).
Cassandra is run on following jvm.

{code}
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
{code}

with jvm args:

{code}
-ea
-javaagent:bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms6G -Xmx6G -Xmn2G -Xss128k
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true
{code}

Populate enough data with stress tool, set crc_check_chance to 0.0, flush and 
compact.
Befor each test run, clean page cache. Stress tool is run from another machine.

* data_access_mode: mmap

{code}
$ tools/stress/bin/stress -n 500000 -S 1024 -I SnappyCompressor -o read -d node0
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
27487,2748,2748,0.01813206242951213,10
65226,3773,3773,0.013355361827287422,20
103145,3791,3791,0.01334416372171528,30
141092,3794,3794,0.013307842310530199,40
178981,3788,3788,0.013323840692549289,50
217062,3808,3808,0.013260129723484152,60
255020,3795,3795,0.01330330892038569,70
293075,3805,3805,0.013265825778478518,80
331046,3797,3797,0.013295910036606884,91
369059,3801,3801,0.01328353458027517,101
i407030,3797,3797,0.01329540965473651,111
444920,3789,3789,0.013323251517550806,121
482894,3797,3797,0.013299231052825617,131
500000,1710,1710,0.010978779375657664,136
END
{code}

* data_access_mode: standard

{code}
$ tools/stress/bin/stress -n 500000 -S 1024 -I SnappyCompressor -o read -d node0
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
25474,2547,2547,0.019527989322446416,10
117046,9157,9157,0.005506617743415018,20
211863,9481,9481,0.005313298248204436,30
306773,9491,9491,0.005311305447265831,40
401107,9433,9433,0.005341160133143935,50
496051,9494,9494,0.005200739383215369,60
500000,394,394,0.0019680931881488986,61
END
{code}

I ran the above several times (making sure each test is isolated), for each 
iteration I observe about the same result.

Things I noticed when digging with VisualVM

- Snappy uncompression with direct bytebuffers seems slightly faster, but its 
impact to overall read performace is negligible.
- I observed that CompressedMappedFileDataInput.reBuffer is called many times 
especially from the path CMFDI.reset -> CMFDI.seek -> CMFDI.reBuffer.
- When using CMFDI, I observe higher cpu usage than CRAR over all.

Right now I cannot find the reason to use mmapped bytebuffers for compressed 
files.

                
> use MMapedBuffer in CompressedSegmentedFile.getSegment
> ------------------------------------------------------
>
>                 Key: CASSANDRA-3623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: compression
>             Fix For: 1.1
>
>         Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 
> 0001-MMaped-Compression-segmented-file-v3.patch, 
> 0001-MMaped-Compression-segmented-file.patch, 
> 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 
> 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, 
> MMappedIO-Performance.docx
>
>
> CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to 
> use the MMap and hence a higher CPU on the nodes and higher latencies on 
> reads. 
> This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
> // TODO refactor this to separate concept of "buffer to avoid lots of read() 
> syscalls" and "compression buffer"
> but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

Reply via email to