[
https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762657#comment-15762657
]
Simon Zhou commented on CASSANDRA-13049:
----------------------------------------
Thanks [~Stefania]. I already got some initial perf results. I'll share them
here once done.
> Too many open files during bootstrapping
> ----------------------------------------
>
> Key: CASSANDRA-13049
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13049
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Simon Zhou
> Assignee: Simon Zhou
>
> We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So
> likely this is something made worse along with improving IO performance in
> Cassandra 3.
> On our side, the issue is that we have lots of small sstables and thus when
> bootstrapping a new node, it receives lots of files during streaming and
> Cassandra keeps all of them open for an unpredictable amount of time.
> Eventually we hit "Too many open files" error and around that time, I can see
> ~1M open files through lsof and almost all of them are *-Data.db and
> *-Index.db. Definitely we should use a better compaction strategy to reduce
> the number of sstables but I see a few possible improvements in Cassandra:
> 1. We use memory map when reading data from sstables. Every time we create a
> new memory map, there is one more file descriptor open. Memory map improves
> IO performance when dealing with large files, do we want to set a file size
> threshold when doing this?
> 2. Whenever we finished receiving a file from peer, we create a
> SSTableReader/BigTableReader, which includes opening the data file and index
> file, and keep them open until some time later (unpredictable). See
> StreamReceiveTask#L110, BigTableWriter#openFinal and
> SSTableReader#InstanceTidier. Is it better to lazily open the data/index
> files or close them more often to reclaim the file descriptors?
> I searched all known issue in JIRA and looks like this is a new issue in
> Cassandra 3. cc [~Stefania] for comments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)