[ 
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810
 ] 

Simon Zhou commented on CASSANDRA-13049:
----------------------------------------

I wrote some micro [benchmark code | 
https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
 To my surprise memory mapping is very efficient even on small files. Here is 
the result on an idle server. For each file size (1k, 10k, 100k, 1m, 10m), 
there are 2000 files. I should have disabled page cache but this result is for 
the first time I ran the test on that server. Having said that, we can stick 
with mmap for efficient IO while seeking for configuration tuning to reduce the 
number of sstables being streamed.

Benchmark             (bufferSize)             (filePath)  (useDirectBuffer)  
Mode  Cnt   Score   Error  Units
MmapPerf.readChannel         65536    /home/szhou/1kfiles              false  
avgt    4   0.044 ± 0.051   s/op
MmapPerf.readChannel         65536    /home/szhou/1kfiles               true  
avgt    4   0.064 ± 0.015   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles              false  
avgt    4   0.050 ± 0.060   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles               true  
avgt    4   0.072 ± 0.019   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles              false  
avgt    4   0.143 ± 0.060   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles               true  
avgt    4   0.166 ± 0.021   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles              false  
avgt    4   1.051 ± 0.801   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles               true  
avgt    4   1.287 ± 0.220   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles              false  
avgt    4   9.696 ± 2.207   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles               true  
avgt    4  13.754 ± 1.379   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles              false  
avgt    4   0.017 ± 0.007   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles               true  
avgt    4   0.017 ± 0.005   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles              false  
avgt    4   0.016 ± 0.004   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles               true  
avgt    4   0.017 ± 0.006   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles              false  
avgt    4   0.023 ± 0.004   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles               true  
avgt    4   0.026 ± 0.006   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles              false  
avgt    4   0.129 ± 0.017   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles               true  
avgt    4   0.132 ± 0.068   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles              false  
avgt    4   1.313 ± 0.262   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles               true  
avgt    4   1.274 ± 0.482   s/op

> Too many open files during bootstrapping
> ----------------------------------------
>
>                 Key: CASSANDRA-13049
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13049
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>
> We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So 
> likely this is something made worse along with improving IO performance in 
> Cassandra 3.
> On our side, the issue is that we have lots of small sstables and thus when 
> bootstrapping a new node, it receives lots of files during streaming and 
> Cassandra keeps all of them open for an unpredictable amount of time. 
> Eventually we hit "Too many open files" error and around that time, I can see 
> ~1M open files through lsof and almost all of them are *-Data.db and 
> *-Index.db. Definitely we should use a better compaction strategy to reduce 
> the number of sstables but I see a few possible improvements in Cassandra:
> 1. We use memory map when reading data from sstables. Every time we create a 
> new memory map, there is one more file descriptor open. Memory map improves 
> IO performance when dealing with large files, do we want to set a file size 
> threshold when doing this?
> 2. Whenever we finished receiving a file from peer, we create a 
> SSTableReader/BigTableReader, which includes opening the data file and index 
> file, and keep them open until some time later (unpredictable). See 
> StreamReceiveTask#L110, BigTableWriter#openFinal and 
> SSTableReader#InstanceTidier. Is it better to lazily open the data/index 
> files or close them more often to reclaim the file descriptors?
> I searched all known issue in JIRA and looks like this is a new issue in 
> Cassandra 3. cc [~Stefania] for comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to