[jira] [Commented] (CASSANDRA-13049) Too many open files during bootstrapping

Simon Zhou (JIRA) Mon, 19 Dec 2016 15:42:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762657#comment-15762657
 ]


Simon Zhou commented on CASSANDRA-13049:
----------------------------------------

Thanks [~Stefania]. I already got some initial perf results. I'll share them 
here once done.

> Too many open files during bootstrapping
> ----------------------------------------
>
>                 Key: CASSANDRA-13049
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13049
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>
> We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So 
> likely this is something made worse along with improving IO performance in 
> Cassandra 3.
> On our side, the issue is that we have lots of small sstables and thus when 
> bootstrapping a new node, it receives lots of files during streaming and 
> Cassandra keeps all of them open for an unpredictable amount of time. 
> Eventually we hit "Too many open files" error and around that time, I can see 
> ~1M open files through lsof and almost all of them are *-Data.db and 
> *-Index.db. Definitely we should use a better compaction strategy to reduce 
> the number of sstables but I see a few possible improvements in Cassandra:
> 1. We use memory map when reading data from sstables. Every time we create a 
> new memory map, there is one more file descriptor open. Memory map improves 
> IO performance when dealing with large files, do we want to set a file size 
> threshold when doing this?
> 2. Whenever we finished receiving a file from peer, we create a 
> SSTableReader/BigTableReader, which includes opening the data file and index 
> file, and keep them open until some time later (unpredictable). See 
> StreamReceiveTask#L110, BigTableWriter#openFinal and 
> SSTableReader#InstanceTidier. Is it better to lazily open the data/index 
> files or close them more often to reclaim the file descriptors?
> I searched all known issue in JIRA and looks like this is a new issue in 
> Cassandra 3. cc [~Stefania] for comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-13049) Too many open files during bootstrapping

Reply via email to