Hao-Nan Zhu created CASSANDRA-19893:
---------------------------------------
Summary: Add timeout or maximum retries for
ColumnFamilyStore.selectAndReference
Key: CASSANDRA-19893
URL: https://issues.apache.org/jira/browse/CASSANDRA-19893
Project: Cassandra
Issue Type: Improvement
Components: Local/Snapshots
Reporter: Hao-Nan Zhu
Hi, I’ve encountered a potential performance bottleneck in _ColumnFamilyStore_
which could cause indefinite waiting.
The method
_[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
has iteration over _ColumnFamilyStore_ instances, which creates hard links for
each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes a call
to[
_selectAndReference_|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873],
which uses a *while (true)* loop to repeatedly attempt obtaining references to
_SSTables_ until successful. This continuous looping behavior can lead to an
indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck or
take an excessively long time to process each _ColumnFamilyStore_ instance.
This issue could become more severe when dealing with a large number of
_ColumnFamilyStore_ instances, leading to significant delays and possible
resource contention.
To mitigate this issue, one possible solution would be introducing a timeout or
a maximum number of retries for {_}selectAndReference{_}. This would help
prevent the indefinite spinning and ensure a more predictable execution time.
I wonder if there is anything incorrect with the analysis, and if it is worth
having an optimization on it. Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]