[
https://issues.apache.org/jira/browse/CASSANDRA-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hao-Nan Zhu updated CASSANDRA-19893:
------------------------------------
Description:
Hi, I’ve encountered a potential performance bottleneck in _ColumnFamilyStore_
which could cause indefinite waiting.
The method
_[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
has iteration over _ColumnFamilyStore_ instances, which creates hard links for
each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes a call
to
{_}[selectAndReference|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873]{_},
which uses a *while (true)* loop to repeatedly attempt obtaining references to
_SSTables_ until successful. This continuous looping behavior can lead to an
indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck or
take an excessively long time to process each _ColumnFamilyStore_ instance.
This issue could become more severe when dealing with a large number of
_ColumnFamilyStore_ instances, leading to significant delays and possible
resource contention.
To mitigate this issue, one possible solution would be introducing a timeout or
a maximum number of retries for {_}selectAndReference{_}. This would help
prevent the indefinite spinning and ensure a more predictable execution time.
I wonder if there is anything incorrect with the analysis, and if it is worth
having an optimization on it. Thanks!
was:
Hi, I’ve encountered a potential performance bottleneck in _ColumnFamilyStore_
which could cause indefinite waiting.
The method
_[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
has iteration over _ColumnFamilyStore_ instances, which creates hard links for
each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes a call
to[
_selectAndReference_|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873],
which uses a *while (true)* loop to repeatedly attempt obtaining references to
_SSTables_ until successful. This continuous looping behavior can lead to an
indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck or
take an excessively long time to process each _ColumnFamilyStore_ instance.
This issue could become more severe when dealing with a large number of
_ColumnFamilyStore_ instances, leading to significant delays and possible
resource contention.
To mitigate this issue, one possible solution would be introducing a timeout or
a maximum number of retries for {_}selectAndReference{_}. This would help
prevent the indefinite spinning and ensure a more predictable execution time.
I wonder if there is anything incorrect with the analysis, and if it is worth
having an optimization on it. Thanks!
> Add timeout or maximum retries for ColumnFamilyStore.selectAndReference
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-19893
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19893
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Snapshots
> Reporter: Hao-Nan Zhu
> Priority: Normal
>
> Hi, I’ve encountered a potential performance bottleneck in
> _ColumnFamilyStore_ which could cause indefinite waiting.
>
> The method
> _[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
> has iteration over _ColumnFamilyStore_ instances, which creates hard links
> for each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes
> a call to
> {_}[selectAndReference|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873]{_},
> which uses a *while (true)* loop to repeatedly attempt obtaining references
> to _SSTables_ until successful. This continuous looping behavior can lead to
> an indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck
> or take an excessively long time to process each _ColumnFamilyStore_
> instance. This issue could become more severe when dealing with a large
> number of _ColumnFamilyStore_ instances, leading to significant delays and
> possible resource contention.
>
> To mitigate this issue, one possible solution would be introducing a timeout
> or a maximum number of retries for {_}selectAndReference{_}. This would help
> prevent the indefinite spinning and ensure a more predictable execution time.
>
> I wonder if there is anything incorrect with the analysis, and if it is worth
> having an optimization on it. Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]